Boosting Variant Recognition with Light Semantics 
C5cile Fabre 
ERSS / \]Ddpt de Sciences du Langage 
Univ. Toulouse-Le Mirail 
5 alldes A. Machado 
31058 Toulouse Cedex, France 
cfabreOuniv-tlse2, fr 
Christian Jacquemin 
CNRS-LIMSI 
BP 133 
91403 ORSAY Cedex 
FI'&IICe 
j acquemin@limsi, fr 
Abstract 
A reasonably simple, domain-independent, 
large-scale approach of lexictd semantics to 
paraphrase recognition is presented in this pa- 
per. It relies on the enrichment of morpho- 
syntactic rules and the addition of fbur boolean 
syntactico-semantic features to a set of 1.,(}23 
words. It results in a significant enhancement 
of precision of 30% with a slight decrease in re- 
call of 10%. 
1 Overview 
The recognition ,of paraphrases and variants is 
an important issue in several areas of infornm- 
tion retrieval and text mlderstanding. Merging 
paraphrastic sentences ilnproves summarization 
by avoiding redundancy (Barzilay et al., 1999). 
Term variant conilation enhances recall in in- 
tbrmation retrieval by pointing at documents 
that contain linguistic variants of (tuery terms 
(Arampatzis et al., 1998). 
In (Jacquemin and Tzoukermann, 1999), a 
technique is proposed for the conflation of 
morpho-syntactic variants that relies solely on 
morphological and low-level syntactic features 
(part-of-speech category, munber agreement, 
morphological relationships, and phrase struc- 
ture). An analysis of these results shows the 
limitation of this approach: correct and incor- 
rect variants cannot be separated satisfactorily 
on a purely morpho-syntactic basis. Sonic addi- 
tional lexical semantics must be taken into con- 
sideration. 
In this study we propose a reasonably sim- 
ple, domain-independent, large-scale approach 
of lexical semantics to noun-to-verb variant 
recognition. It relies on the mere addition of 
two t)oolean syntactic features to 449 verbs and 
two boolean morpho-semantic features to 574 
nouns. It result,; in a significant enhancement 
of precision of 30% with a slight decrease in re- 
call of 10%. This new al)proaeh to semantics-- 
human-based, ettlcient, involving simple linguis- 
tic tbatures --convincingly illustrates the posi- 
tive role of linguistic knowledge in information 
processing. It confirms that verbs and their se- 
mantics play a significant role in document anal- 
ysis (Klavans and Kan, 1998). 
2 Morpho-syntactic Approach to 
Nomino-verbal Variation 
In order to illustrate the contribution of se- 
mantics to the detection of paraphrastic struc- 
tures, we focus on a specific type of wtriation: 
the vo.rl)al varbmts of Noun-Preposition-Noun 
terms o1" compounds in French. For example, 
les corttraintes rdsiduellcs darts les coques sont 
anaIysdes (the residual constraints in the shells 
~re mialyzed) is such a vert)al variant of analyse 
de corttraintc (constraint analysis). 
As a baseline tbr the extr~mtion of these vari- 
ants, we use a set of five morpho-syntactic trans- 
tbrmations fbr Noun-Preposition-Noun terms 
reported in (Jacquemin and Tzoukernlann, 
1999) (see Table 1). 1 We use the no- 
tation Ad(Ni)v for |;tie inorphological link 
between the initial term and the trans- 
ibrmed structure. It rel)resents any verb 
in the same morphological fanfily as Ni. 
For instance, in English, and according 
to the CELEX database, Ad(analysis)v = 
{to analyze, to psychoanalyze}. 
Given a NI P2 N8 structure, these transt'ornm- 
tions are obtained through corpus-based tuning 
1The following symbols are used for syntactic cate- 
gories: N (nouI0, A (adjective), Av (adverb), V (verb), C 
(coordinating conjunction), P (pret}osition), and D (de- 
tcrminer). In the regular exl}ressions, ? denotes option- 
ality and I disjunction. Morphologically related words 
are underlined. 
264 
Tal)le 1: Mort)he-syntactic (MS) Variants of N1 P2 Na Terms 
NheadToV: Ad(N~ )v (Av: (P'? \]) \[P D '?) A:) N:~ 
stabilisation de priz (price sl;al)ilization) ~ stabiliser le'm's priz (stal)ilize their prices) 
NheadToVRev: Na(A? (PA?N(A(CA)?)?)? (CI)': Av': A'?NA?)'?V':V': Av':)A4(N1)v 
abattage d'arbre (tree cutting) -+ m'bres oat dtd abattus (trees have been cut down) 
NmodifToVl: N1 ((Av ? A (C av '? A)'?) ': V': P) fl4(Na)v 
mdth, ode d'.dvaluatio~ (method of evaluatio~ 0 ~ mdth, ode pour d'valuer (method tbr ewduating) 
NmodifToV2: Nj (A': (V\[ (l) D "? (Av': A) '? N) '?) (Av '? A)': Av':) Ad(Na)v 
zone de ddstabilisation (region of destal)ilization) --+ zone d&tabilisde (destat)ilized region) 
NmodifToVRev: Ad(Ua)v (Av '? (P'? l)) \[ (P D':)A':) N~ 
tcrnpdrat,ure de chau./.\[age (temi)erature fi)r heating_') 
-+ chau/.\]~s a h, aute tempdrat'urc, (heated at high teml)eratures) 
and correspond basically to tbur configurations: 
1. either N1 or Na (re, st)cct;ivcly head and 
modifier of the initial term) is I;r~msformed 
into a morphologically related verb V, 
2. the order of the two content words is re- 
tained or reversed, 
3. the dependency relation 1)etween 1;11(; two 
initial 11011118 is preserved. 
For instance, rule NheadqlbVl/,ev corresl)onds 
to transfi)rmations in whi(:h the, head noun 
is morphologically re,1;~t(;d to the verl) and 
t:h(; ord('r of the two words is re, verse(l; rul(; 
Nmodit'FoV (modifier transfi)rmed, order r(;- 
rained) has been divided into two sul)ruh;s: the 
first one - Nmo(tifYoV1 - re, quires the insertion 
of a pr(;positiou just 1)eft)re th(; verbal form. 
3 The Limits of the 
Morpho-syntactic AI)proach 
In the tirst step of this work, we ext)e(:ted th(, 
precision of wu:iant recoglfition to be controlled 
in two ways: firstty, by searching fi)r multi- 
term variants in which the two content words of 
the initial term are foulM, directly or via mor- 
1)hological transformation. Se(:ondly, by dell1> 
ing morpho-syntacti(: pa|;te.rns of variation in 
I;erms of l)art-of-sl)eech strings that are allowed 
to come in 1)e.tween these t;wo COlll;eltt words. 
Yet, the sequences found on su(;h a morl)ho- 
synl;acti(: basis prove to 1)e of wtrying quality 
regarding their at/ility to t)rovide t)arat)hrases 
of the initial l;erm. Consider for instance some 
of l;he vm:imlts del, eeted for the term comparai- 
son de rds'ultat (comparison of results), in which 
only t;11(; \[irst; |;we sequences are good variants: 
compare les rdsultats ((:onlpare the results) 
(rule Nhead~lbV, pattern A4 (N1)vDNa) 
r&ultats ezpdrirnc.ntauz sent compar& (exI)er- 
imental results are compared) (rule Nhead- 
toVRev, 1)atte.rn NaAVA4 (N~)v) 
com, pard.s a'ux 'rds'.,ltats (COmlmred to 
the results) (rule NheadToV, pattern 
Jt4(N1)vPN:{) 
rds'ultd d"unc eomparaison (resulted from ;t 
comparison) (rule NModiiToVRev, pattern 
J~d (Na)vPDN1 ) 
Such examples show that morpho-syntactie 
patterns ;~re 1;oo coarse-gr~tined to ensure l;hat 
the dependency relation between the two piv- 
ots (results is the object of the prediea£e com- 
parison) is maintained. When trying to detine 
linguistic criteria to ewfluate such w~riants, it; 
apt)ears that the frontier between good and load 
variants lies between those that preserve the ar- 
gument relation 1)etween the two content words 
and those that disrupt it. This means that, in 
the verbal wtriant, the. argument relation be- 
tween the verb and l;he noun must be l;he same 
as t;he relal;io11 between the deverl)al llOUll add 
the othe.r noun in the nominal term. 
None of the five rules ensures that the subcat- 
egorization frame is preserved. For instance, if 
we consider t;11o rule NModifI'oVl{ev, we find se- 
265 
quences that obey this constraint and sequences 
that violate it2: 
cr'itdrc d'&aluation (evaluation criterion) -+ 
dvalv, d selon les crit~r'cs (evahmted according to 
the criteria) 
syst~me d'dvaluation (evaluation system) *--+ 
dvaht6 lc syst&ne (evaluated the system) 
In the second case, the transtbrmation is un- 
acceptable because the instrumental relation ex- 
pressed in the nonfinal term becomes an ob- 
ject relation in the verbal sequence. Even 
when word order is preserved, the relation be- 
tween the pivots can be totally different in the 
term and its transformation, as i.u: contrgIe 
d'installation (installation control) and contrgle 
ccntralisd installd (installed centralized control) 
(rule NModitToV2). 
Our aim was to tbrmulate additional con- 
straints in order to control argument structure 
preservation. We thus had to cope with prob- 
lem of handling nonfinal t)hrases (NP) in which 
one of the elements is morphologically linked to 
a verb. In French, as in English, the seman- 
tics of these nominal phrases is an issue tbr lin- 
guistic description: the two nouns can be linked 
by the whole range of argmnent-predicate rela- 
tions, and very few linguistic elements can be 
used to decide what relation is expressed. Here 
is a brief list of the configurations that are likely 
to appear in such NPs: 
- the second noun is the object of the first; one: 
comparaison de rdsultat (comparison of result) 
- the second noun is the subject of the first 
one: augmentation de I'intcv, sitd (increase in in- 
tensity) 
- the second noun is an adjunct: tr'aitcment g 
la chaleur (treating with heat) 
the first noun is an adjunct: taux 
d'augmentation (increase rate) 
Our aim was to find a way to use surface lin- 
guistic knowledge, as required in such an area 
of NLP, to deal with the interpretation of these 
phrases. 
4 Light Semantics for 
Nomino-verbal Variations 
Our approach consisted of two steps: firstly, 
defining semantic clues tbr accepting or discard- 
sin what follows~ the symbols --~ and *----> respectively 
indicate correct and incorrect transformations 
ing variants and, secondly, defining new varia- 
tion patterns based oll these features. 
4.1 Filtering Criteria 
First, using linguistic results on the semantics 
of French NPs (Fabre, 1996; Bartning, 1990), 
we identified predicate-argument configurations 
that cannot be matched by a given pattern ('re- 
ject' heuristics in the sense of (Lapata, 1999)). 
For example, when rule NmodifToVRev applies, 
N1 de N3 terms cannot be i)araphrased by ver- 
bal sequences in which N1 is the ol)ject of the 
verb, as in: ezp&iencc d'utilisation (experiment 
of use) *-+ utilisait 'uric ezpdrier~,cc (used an ex- 
periment). In such a configuration, only non-- 
thematic arguments (adjuncts) of the deverbal 
noun may be tbund inside the NP. 
Similarly, when rule NheadToVRev applies, 
N\] de N3 terms cannot be paraphrased by ver- 
bal sequences in which N\] is the subject of a 
transitive verb, as in: utilisation de l'ezp&'icnce 
(use of experiment) *-+ czpdriencc utilisant (ex- 
t)eriment using). 
This configuration provides variants only 
when the verb is intransitive or ergative: erga- 
tive verbs allow tbr alternations of the tbrm: NP 
V (la dcnsitd au.qrncntc) / one V NP (on awl- 
monte la dcnsitd). 
In this case, the tbllowing transtbrmation is 
correct: augmentation de densitd (density in- 
crease) / de,,,,s'itd av,9'mentc (density increases). 
4.2 Enriched Metarules 
Once it has 1)een established wtfich transforma- 
tions should be rejected, we searched tbr sur- 
face linguistic clues that could help us to fil- 
ter out these undesirable variants. It led us to 
the redefinition of the metarules, in two ways: 
putting additional constraints on the part-of 
speech strings that can intervene between the 
two pivots, and defining new features to add lin- 
guistic control upon the application of the rules. 
These t~atures are: the prepositional form, the 
morphological type of the noun, the transitivity 
of the verb, and the voice (active versus pas- 
sive). 
Here are two examples tbr the redefinition of 
the metarules (fllrther details and examples are 
given in table 3): 
rule NmodifToVRev In this case, the 
metarule is transtbrmed into a single 
266 
17elilled rule, in whi('h the ('oml)ilm|:ion 
of parts of sl)eech is mot(; res|;riel;ed: a 
preposition is required to elinfinate object 
rein|ions from the verbal phrase. In ~ul- 
dition, the morphologic~dly comt)lex nora1 
must be ~ \])recessive deverl)M. %'anstbr- 
martens such a.s czpdricnce d',utili.srltion 
*-> ul, ili.sa, it. 'u, ne c.zpdricncc, a.re filtered 
()u|;. 
rule NheadToVRev Here, the initial 
metarule is refined into three em'iched 
rules, mainly by means of lexical con- 
straints on the verb tbrm. Only N1 P2 Na 
t;(;171118 whe, re 1)2 = dc m:e |;real;ed. \]if the 
v(;rl) is transitive, l;helt the verb forln nlUSI; 
l)e ;t past t)m;l;i(;il)le (rule Nlw, a(tt()Vl/.('v- 
\])ass), so l;ha, t 1;t1(', object relation still hol(ls 
in the vm'iant, if the verl) is intrnnsitive 
or ergative, then the verl) fornl nms|; 
l)e active, st) |;h~l; the sut)je(:t; rel~d;ion 
holds (rule Nhea(ltoVll.ev-A(:tSiml) (resp. 
NheadtoV\]l,ev-ActComp) for simt)le (resl). 
(:omt)h;x) verb fornls). '.l~:allSl'orm~tions 
such as utilisation dc l'c.zp(;ric'nce *-+ 
czpd'ricncc 'utilisant ~r('~ filtered ()tiC;. 
The r(;iinenlenl; of the mel;m'ul(',s introduced 
four linguistic \]b&i;llr(*,s whi('h had to 1)e encode(t 
in the h',xi('on (see Table 2), nmn(',ty: 
• 1;11(', morl)hologi(:al nature of the noun: th(! 
noun is either non (h:verl)al or devert)~d. In 
the l~d;ter ease, it; may (:orrest)on(t t() ;m 
agent deverbal, which reihrs to the agent 
of the verb, e.g. 'utili,sateur (user), or to ~ 
t)rocessive deverbM, whMt reibrs to the a(:- 
|ion (tenoted by the verb, e.g. utilisation 
(.se). 
• the transitivity of the verb: intr;msitive 
mid ergative verbs are marked in the lexi- 
(;Oll. 
This mine|at|on task is not tinm-(:onsuming 
(al)otd; 3 hours for 1.,023 words) and could be 
parl;ly automated: characteristic endings (:ould 
hell) to detect processive mid agent deverb~fls. 
In addition, intrmlsitive mid ergative verbs form 
a sm~fll set of the vert)al lexicon (8% of the 
ver|)s) which is likely to 1)(', l)artly (lom~dn- 
indel)endent. 
5 Experiments and Evaluations 
In this section, we ew~luate the variations pro- 
duced fl'om the two preceding sets of metarules: 
initial morpho-synt~mti(" wn'iations (henceforth 
MS) m~d new wn:i~tions enriched through light; 
semantics (henceforl;h MS+S). i 
The wtriald;s ",u:e ot)t~dne(1 Kern a 13.2 million- 
word (:orpus (:omposed of s(:ientiti(: al)stracl;s 
in the agricultural dora;fin (in French) ;rod a 
set of 11,452 terms. :~ The corlms is mlnlyzed 
through SYLEX, a shallow parser l;h~t buihts 
limited 1)hrase structures and associ~tes each 
word with mt unambiguous syntactic (:ategory 
and a, l(;mma. ~Ibrms are acquired from the out- 
|;ltl'es a,l:e sele,(:ted nn(t only terms that occur ~l: 
lea.st three times in the ('ortms m:e retained. 
The nunll)ers of variants exi;r&c|;ed through 
MS nnd MS+S ~u'e reporl;ed in ~l~fl)le el. 
They are re:ranged in su('h ;~ w~y (;hat ('or- 
responding wu'iations are aligned horizontnlly. 
For instance, each of the three MS+S vari- 
ations Nhea(lToV-Conq), NheadToV-SimI) or 
NheadtoV-l)rel ) is a refinenmnt of the MS vari~> 
|ion Nhea(lToV. In other words, the set of wtri- 
ants extracted by these three rich llle|;a, ru\]es is 
in('hl(led into the set of variants exl;ra~cl;ed l)y 
th(', 1)oor met;re'nit. These two sets are not eqmd 
since the rich metm:uh'~s are mnde more sele(:tive 
th;m the origimfl me(mule fl:om whi(:h they m:e 
derived. 
In addition to the oul;tm(; of ri(:h mid poor 
met~mfles, T;fl)le 4 shows, in |;he third col- 
umn, the mnnber of co-occurrences associated 
with these metarules. Co-occurrences m'e the 
least selective filters associated with morpho- 
syn|;~mti(: varimlts; they nre ext/ected to extract 
all the l)Ossible ('orrect nomino-verb:fl variations 
(recall value 1.0). Given a N1 Pu Na term, these 
co-occurrences corresl)ond to a configuration in 
which N1 co-occurs with a verb that is roof 
phologically related to Na or Na co-occurs with 
~r verb related to N~. Co-occurrences are ex- 
tra(:ted from a ll-word window (9 intervening 
words). These co-occurrences are used to eval- 
m~te the recall wflues of the tiltering metarules. 
awe arc grateflfl to Xavier Polanco, Jean Royautd and 
Lmncnt Schmitt (INIST-CNRS) for t)roviding us with 
this s(:icntitic corpus. 
267 
Table 2: Semantically Enriched Lexicon. 
Word Processive Deverbal Agent Deverbal Intransitive Ergative 
abaisser - D - A - I - E 
abaissement +D -A -I -E 
absorber -D -A -I -E 
absorbe'ar +D +A -I -E 
accorder - D - A - I - E 
accord +D -A -I -E 
accumuler - D - A - I - E 
aceumulateur +D +A - I - E 
accumulation ÷ D - A - I - E 
accdl&'er -D -A -I +E 
Table 3: Semantically Enriched Morpho-syntactic (MS+S) Variants of N1 P2 N:t Terms 
NheadToV- Comp: avoir Av '~ 34 (N 1 ) V Av ? D A t N3 
{(N1 d,,ev) = proeessive A P2 =- d,e A (34(NI)v tense)= pastpartieiple} 
comparaison de rdsultats (comparison of' results) 
--~ a compard les rdsultats (has compared results) 
NheadToV-Simp: 34(N1)v Av ? D A ? N3 
{(N1 dev) = proeessfve A P2 = de A (34(N1)v tense) ¢ pastparticiple} 
dvaluation de risques (ewduntion of risks) --+ (~valv, er les risques (to ewfluate risks) 
NheadtoV-Prep: 34(Nl)v Av ? P2 D A ? N.~ 
{(Nt dev) = processive} 
exposition d la lumi&'e (exposure to light) --+ ezposdes it la lumidre (exposed to light) 
NheadtoVRev-Pass: N3 (A ? (P A ? N (A (C A)?)?) ': (C D ? Av ? A ? N A?) ? V ? dtre': Av ?) 2td(N1)v 
{(N3 agreement) = (3.d(N1)v agreement) A P2 = de A (N1 de',,) =-proces,sive A 
(34(N )v tense) = p ,,stp rtie,:ple A (M(N )v = 
r@artition de ch, ar.qe (weight distribution) --+ eh, arge @alement r@artie (equally distributed weight) 
NheadtoVRev-ActSimp: Na(A?(PA?N(A(CA)?)?)?(CD'~Av?A?NA?)?)Ad(N1)v 
{P2 = de A = p,'o essi e A (34(N )v tense)¢p stpo, rtie',:ple A 
(Jbl (N1)v valence) = (er.qativelintransitive) } 
chute de tempdrature (drop in temt, erature ) --+ tempdrature ch,'ute (temperature drops) 
NheadtoVRev-ActComp: Na (n ? (P A t N (A (C A)?)':) ? (C D ? Av': A': N A':)': avoir ? Av ?) 3d(N1 )v 
{P2 = de A (N1 dev} = processive A {3d(N1)v tense) = pastparticiple A 
{3d (N1)v valence) = (ergativelintransitive) } 
fermentation de jus (juice fermentatio,t) --+ jus de raisins fermentds (fermented grape juice) 
Precision and Recall 
In order to calculate the precision and recall of 
the rich and poor metarules and to estimate the 
gains of semantic enrichment, a set of 1,000 co- 
occurrences has been randomly chosen among 
the 159,898 co-occurrences retrieved by the sys- 
tem. They have been divided into three sets: 
S1 (500 co-occurrences) and S2 and S~ (250 co- 
occurrences). S~ has been evaluated indepen- 
dently by the two judges (i.e. the two authors) 
268 
Tabh; 4: Counts of varinnts of NI P2 Na terms 
MS MS+S Co-occurrences 
874 NheadToV-Coml) 
38,693 NheadToV 15,583 NheadToV-Silnp 
7,644 NheadtoV-Prep 
14,24:8 NheadtoVRev-Pass 69,056 NIN2toV1N2 
20,453 Nhead2bVI/.ev 197 Nhea(ltoVI{ev-ActSimp 
26 NheadtoVll.ev-Act Coral) 
6,803 NlnodifPoV1 2,749 NlnoctitWoV1-Ppr 42,882 NIN2toN2V1 
1,160 Nmodif\]}oV2-Infl } 
2,588 Nmodif£oV2 0 NmoclitXbV2-Inf2 26,971 N1N2toNIV2 
1 NmodifYoV2-hff3 
~( 9,363 NmoditToVI/.ev 1,892 Nmo(tifPoVIl.ev-Prep 20,989 NIN2toV2N1 
77,900 44,374 159,898 
ill order to test the level of agrcelnent and $2 
and S. 5 have been ewfluated separately by only 
one judge each. Each cooccurreuce has been 
marked as t)ositive (~ correct variation), nega- 
tive (an incorrect variation) or inevaluable. In- 
ewfluable cases correspond either to tagging er- 
rors or to iÀl(;orrect terms such ;ts (:oq'ttc de form(. 
(shell of shape) wlfich is an incoml)lete tca'm 
structure \])ec~utse it shouht t0e followed by an 
adjective such as coqu, c dcform, c oval(', (oval- 
shaped shell). Only the cases of ;tgreeul(u,t l)e- 
|;ween the two judges are used for the COml)uta- 
tion of rex:all and t)recision values. 
The achtition of semantics results in an in- 
(:,'ease of precision of 0.29: from 0.499 fbr MS 
to 0.789 for MS+S. The corresl)onding decrease 
of recall is nm(:h smaller: 0.11 from 0.696 for 
MS to 0.586 for MS+S. Pre(:ision and recall can 
t)e (:onfloine(t into a single me,mute such as the 
eifeetiveness measure E~ given by Fonmfla (1) 
in which t~ is a parameter (0 _< a < 1) (van 
Rijsbergen, 1975): 
Ea = 1 - (1) 
E~ varies fl:om 0 to 1.0. Low wflues of Ea cor- 
respond to combined high recall and high preci- 
I in order to assign an equal sion. If we use oe = 
trot)or|ante to precision and recall, the E1 val- 
ues are 0.419 fi)r MS and 0.327 for MS+S. They 
indicate that the addition of semantics has sig- 
nificantly improved the quality of w~riant ex- 
traction. Detailed values of recall and precision 
arc; showll ill Table 5. 
Agreement on Judgment 
Agreement on ~ classification task can 1)e mea- 
sured through the kappa coefficient (K). It 
ewduato.s the pairwise agreement mnong a set; 
of coders making category .iudgment, correcting 
tbr expected chance agreement (Carletta, 1996). 
In our case the results of the ternary class|It- 
cation task are given by Table 6. The simple 
kappa cecil|(tent is 
Po- P,! 
K : -- (2) I-P~. 
7232. in which P0 = Ei~ and < = Ei( ,/ ~2~-) (Co- 
hen, 1960). P0 is the proportion of times the 
coders agree and I~, is the proportion of tiines 
we would expect them to agree by chance. The 
value of the kappa coetficient is 0.91 indicating 
a good reliability of the evaluation pertbrmed 
by the two independent .judges. 
6 Conclusion 
On a linguistic point of view, this experiment 
demonstrates that NLP applications can pro- 
vide new issues tbr the description of linguis- 
269 
Table 5: Precision and recall in variant extraction for MS and MS+S variations 
PMS PMS+S RMS RMS+S 
0.438 NheadToV 
0.735 NheadToVRev 
0.111 NmodifroV1 
0.769 NmodifToV2 
0.448 NmodifToVRev 
{ 
{ 
0.875 
0.938 
0.565 
0.902 
1.000 
0.308 
1.000 
O.O00 
NheadToV-Comp 
NheadToV-Sim t) 
NheadtoV-Prep 
NheadtoVRev-Pass 0.806 0.664 
NheadtoVRev-ActSimp 
NheadtoVRev-ActComp 
NmodifToVl-Pt)r 0.674 0.578 
NmodiiToV2-hffl } 
NmoditToV2-hff2 0.357 0.214 
NmoditToV2-Inf3 
NmoditToVI{ev-Prep 0.765 0.765 
0.499 0.789 0.696 0.586 
Table 6: Frequencies of t)airwise judgments for 
the ternary classification of nomino-verbal w~ri- 
ations (, = inevahmble, + = correct;, - = in- 
correct). 
nij * + - hi. 
. 120 9 1 130 
÷ 1 184 6 191 
- 4 10 165 179 
n.j 125 203 172 500 
tic phenomena. The problem of linguistic vari- 
ation in information processing forces the lin- 
guist to reconsider parat)hrase and trm~sf'orma- 
tion mechanisms in a new perspective, based 
on real linguistic data and on systematic cort)us 
exploration. The paraphrase judgment is eval- 
uated in a new way, from a practical point of 
view: two sequences are said to be a paraphrase 
of each other if the user of an information sys- 
tem considers that they bring identical or sin> 
ilar information content. Regarding linguistic 
methodology, this work led us to find "light" so- 
lutions in terms of lexical encoding to describe 
complex semantic t)henomena. This approach is 
pronfising because it demonstrates that linguis- 
tic knowledge can really enhance the results of 
term recognition beyond the ,norphology level, 
and that semantics can be taken into account 
to sonm extent. 

References 

A. T. Arampatzis, T. Tsoris, C. H. A. Koster, and 
Tit. P. van der V~reide. 1998. Phrase-based infof 
mation retrieval. Information Proccssin9 '~ Mana.qe- 
ment, 34(6):693 707. 

Inge Bartning. 1990. Los syntagmes l)inoininaux en de - 
les types interprdtatifs subjectifs et agentifs. In Pro- 
cecdings, dixidmc congr~s des romanistcs .scandinavcs. 

Regina Barzilay, Kathleen McKeown, and Michael Elhadad. 1999. Informational fusion in the context 
of multi-document summarization. In Proceedings of 
ACL'99, pages 55-557, University of Mawland. 

Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistics. Computational Linguistics, 22(2):249-254. 

J. Cohen. 1960. A coefficient of agreement for nominal 
scales. Educational and P.sychological Measurement, 
20(1):37-46. 

Cdcile Fabre. 1996. Intcrprdtation automatiquc des 
sdquences binominales en fran~:ais et en an.qlais. 
Ph.D. thesis, Universit5 Returns I. 

Christiml Jacquemin and Evelyne Tzoukermmm. 1999. 
NLP ibr term variant extraction: A synergy of mor- 
phology, lexicon, and syntax. In Tomek Strzalkowski, 
editor, Natural Language Information Retrieval, pages 
25-74. Kluwer, Boston, MA. 

Judith Klavans and Min-Yen Kan. 1998. Role of verbs 
in document analysis. In Proceedings of COLING-ACL'98, pages 680-686, Universitd de Montrdal, Morttreal, Canada. 

Maria Lapata. 1999. Acquiring lexical generalizations 
from corpora: A case study for diathesis alternations. 
In Proceedings of ACL'99, pages 397-404, University 
of Maryland. 

C. J. van Rijsbergen. 1975. In\]ormation Retrieval. But- 
terworth, London. 
