POS Tagging Using Relaxation Labelling 
Llufs Padr6 
Departamenl; de Llenguatges i Sistemes Informgtics 
Universitat Polit6cnica de Catahmya 
Pan Gargallo, 5. 08071 Barcelona, Spain 
padro~lsi, upc. es 
Abstract 
Relaxation labelling is an optimization 
technique used in many fields to solve 
constraint satisfael,ion problems. The al- 
gorithm finds a combination of values 
for a set of variables such that satis- 
fies -to the maximum possible degree- a 
set of given constraints. This paper de- 
scribes some experiments performed ap- 
plying it to POS tagging, and the results 
obtained, it also ponders the possibil- 
ity of applying it to Word Sense Disam- 
biguation. 
1 Introduction and Motivation 
Relaxation is a well-known technique used to solve 
consistent labelling problems. Actually, relax- 
ation is a family of energy-function-minimizing al- 
gorithms closely related to Boltzmann machines, 
gradient step, and Hopfield nets. 
A consistent labelling problem consists of, giwm 
a set of variables, assigning to each variable a la- 
bc'l compatible with the labels of the other ones, 
according to a set of compatibility constraints. 
Many problems can be stated as a labelling 
problem: the travelling salesman problen 4 n- 
queens, corner and edge recognition, image 
smoothing, etc. 
In this paper we will try to make a first, insight 
into applying relaxation labelling to natural lan- 
guage processing. The main idea of the work is 
that NLP problems such as POS tagging or WSD 
can be stated as constraint satisfaction problems, 
thus, they could be addressed with the usual tech- 
niques of that field, such as relaxation labelling. 
It seems reasonable to consider POS tagging or 
WSD as combinatorial problenrs in which we have 
a set of variables (words in a sentence) a set, of 
possible labels for each one (POS tags or senses), 
and a set of constraints for these labels. We might 
also coinbine both problems in only one, and ex- 
press constraints between the two types of tags, 
using semantic information to disambiguate POS 
tags and visceversa. This is not the point; in this 
paper, but it will be addressed in fllrther work. 
2 Relaxation Labelling Algorithm 
Relaxation labelling is a generic name for a family 
of iterative algorittuns which perform function op- 
timization, based (m local infi~rmation. See (Tor- 
ras 89) for a clear exposition. 
Let V = {vl, v2,..., v,,~} be a set of variables Let t 
= , ,,,~ } be the set of possilfle 
labels for variable vi. 
Let Cb' be a set: of constraints between the la- 
bels of the variables. Each constraint C C CS 
states a "compatibility value" C,. ibr a colnbina- 
lion of pairs variable-label. Constraints can be of 
any order (that is, any number of variables may 
be involved in a constraint). 
The aim of the algorithm is to find a weighted 
labelling such that "global consistency" is maxi- 
mized. A weighted labelling is a weight assigna- 
tion for each possibh', label of each variable. Max- 
infizing "Global consistency" is defined as maxi- 
)i )i is the weight mizing ~j t j x Sij , Vvi. Where I j 
for label j in wtriable vi and Sij the support re- 
ceived by the same combination. The support for 
a pair w~riable-label expresses how compatible is 
that pair with the labels of neighbouring variables, 
according to the constraint set. 
The relaxation algorithm consists of: 
• start in a randoln weighted labelling. 
• fbr each variable, compute the "support" that 
each label receives froln the current .weights 
for the labels of the other variabh;s. 
• Update the weight of each variable label ac- 
(:ording to the support obtained. 
• iterate the process until a convergence crite- 
rion is met. 
The support computing and label weight chang- 
ing must be perfornmd in parallel, to avoid that 
changing the a variable weights would affect t;he 
support colnputation of the others. 
The algorithm requires a way to compute which 
is the support for a wn'iable label given the others 
877 
and the constraints. This is called the "support 
function". 
Several support, functions are used in tire liter- 
ature to define the support received by label j of 
variable i (Sij). 
Being: 
1"1 ?'d R~j = {," I r -- \[(v,,, tk~),..., (~, *}),..., (v,.,, t.k,,)\] 
tile set of constraints on label j for variable i, 
i.e. the constraints formed by any coinbination of 
pairs variable-label that includes the pair (vi, t}). 
rl l)k, (m) the weight assigned to label t~.~ for variable 
v,,~ at time m. 
TO(V) the set of all possible subsets of variables in 
V. 
R~ (for G E T°(V)) the set of constraints on tag 
i ieor word j in which the involved variables are 
exactly those of G. 
Usual support flnmtions are based on coinput- 
ing, for each constraint r involving (vi,t}), tile 
"constraint influence", Inf(r) = C,. x p~'(m) x 
... x p~Z., (m), which is the product of tile current 
weights for the labels appearing the constraint 
except (vi,t}) (representing how applicable is tile 
constraint in the current context) multiplied by C.,. 
which is the constraint compatibility value (stat- 
ing how compatible is the pair with the context). 
The first formula combines influences just 
adding them: 
(1.1) Sij = ~ Inf(r) 
rGRij 
The next fornmla adds the constraint influences 
grouped according to the variables they involve, 
then multiplies the results of each group to get 
the final value: 
(1.2) &-- 11 
The last formula is tile same than the previous 
one, but instead of adding the constraint influ- 
ences in the same group, just picks tile maximum. 
(1.3) Sij = II max {Inf(r)} 
The algorithm also needs art "updating func- 
tion" to compute at each iteration which is tile 
new weight for a variable label, arrd this compu- 
tation must be done in such a way that it can be 
proven to meet a certain convergence criterkm, at 
least under appropriate conditions 1 
Several formulas have been proposed and some 
of them have been proven to be approximations of 
a gradient step algorithin. 
Usual updating flmctions are the following. 
~Convergence has been proven under certain con- 
ditions, but in a complex application such as POS 
gagging we will lind cases where it is not necessarily 
achieved. Alternative stopping criterions will require 
further attention. 
Tile first formula increases weights for labels 
with support greater than 1, and decreases those 
with support smaller than 1. The denonfinator 
expression is a normalization factor. 
(2.1) p}(m + 1) = ~;~ where S,ij > 0 
i 
k I 
The second formula increases weight for labels 
with support greater than 0 and decreases weight, 
for those with support smaller than 0. 
~ (~,,) x (1 + &j) (2.2) + 1) = 
k=l 
where-l<Sij <_ +1 
Advantages of the algorithm are: 
• Its irighly local character (only the state 
at, previous time step is needed to compute 
each new weight). This makes the algorithm 
highly parallelizable. 
• Its expressivity, since we state the problem in 
terms of constraints between labels. 
• Its flexibility, we don't have to check absolute 
coherence of constraints. 
• Its robustness, sin(:(,' it can give an answer to 
problenls without an exact solution (incom- 
patible constraints, insufficient data...) 
• Its ability to find local-optima solutions to 
NP problems in a non-exponential time. 
(Only if we have an upper bound for the nun> 
ber of iterations, i.e. convergence is fast, or 
the algorithm is stopped after a fixed number 
of iterations. See section 4 for further details) 
Drawbacks of tire algorithm are: 
• Its cost. Being n the number of variables, 
v the average number of possible labels per 
variable, c the average number of constraints 
per label, and I tire average number of iter- 
ations until convergence, tile average cost is 
n x v x c x i, an expression in which the inulgi~ 
plying terms ,night; be much bigger than n if 
we deal with probh',ms with many values and 
constraints, or if convergence is not quickly 
achieved. 
• Since it acts as an approximation of gradi- 
ent step algorithms, it has similar weakness: 
Found optima are local, and convergence is 
not always guaranteed. 
• In ge, ne, ral, constraints must be written mann- 
ally, since they at(', the modelling of the prob- 
lem. This is good for easily modelable or 
reduced constraint-set problems, but in the 
case of POS tagging or WSD constraints are 
too many and too complicated l;o be written 
by hand. 
8 '7 8 
• The diificulty to state which is the "(:omt)at- 
ibility value" for each constraint. 
• The, difficulty to choose the support and up- 
dating fun('tions more suitable for ea(:h l)ar- 
titular prol)lem. 
3 Application to POS Tagging 
In this section we expose our application of relax- 
ation labelling to assign 1);u't of speech tags to the 
words in a sentenc, e. 
Addressing tagging problems through ot)timiza- 
tion methods has been done in (Schmid 94) (POS 
tagging using neural networks) and in (Cowie et 
al. 92) (WSD using sinmlated annealing). (Pelillo 
& I{efice 94) use a toy POS tagging l)i'oblenl to ex- 
t)eriment their methods to improve the quality of 
eoInt)atibility coeflh:ients for the constraints used 
by a relaxation labelling algorithm. 
The model used is lie tblh)wing: each word ill 
the text is a variable and may take several hfl)els, 
which are its POS tags. 
Since the number of variabh~s lind word po- 
sition will vary from one senten(:e to another, 
constraints are expressed in relative terms (e.g. 
\[(vi, Determiner)(v.i ,,, Adjective)(vi ,2, Nou'r0\]). 
The Conshnint Set 
l{elaxation labelling is a.bh~ to deal wil;h con- 
straints 1)etween any subset of wn'ial)les. 
Any rehttionship between any subset of words 
and tags may 1)e expressed as constraint and used 
l;o feed th(: algorithm. So, linguisl;s are fre(, to ex- 
press ;my kind of constraint an(l are not restricted 
I:o previously decided patl;erns like in (Brill 92). 
Constraints for subsets of two and three vari- 
ables are automati(:ally acquired, and any other 
subsets are left, to the linguists' criterion. That is, 
we are establishing two classes of constraints: the 
autoinatically acquired, and the mmmally writ- 
ten. This means that we ha.ve a great model flex- 
ibility: we can choose among a completely hand 
written model, where, a linguist has written all 
l;he constraint;s, a comph~tely mm)mat, ically lie- 
rived model, or ally interinediate (:olnl)ination of 
(',onstrailfl;s fl'om ea, ch (;ype. 
We can use the same information than HMM 
taggers to ot)tain automatic (:onstraints: the 
1)robability 2. of transition fl'om one tag to an- 
other (bigram -or binary constraint- probability) 
will give us an idea of how eomt)atible they are in 
the positions i and i + 1, ;rod the same for l;rigrain 
-or ternary cbnstraint- probabilities. Extending 
~Esl;imated fi'om occurrences in tagged (:ort)or~t. 
W(: prefer tll(: use of supervis(:d training (sin(:e large 
enough corpora arc available) because of the diffi- 
culty of using an unsut)ervised method (such as Bmm> 
Welch re-estimation) when dealing, as in our case, 
with heterogeneous constraints. 
this to higher order constraints is possil)le, but; 
would result in prohibitive comtmt;ational costs. 
l)ealing with han(l-written constraints will not 
be so easy, since it; is not obvious \]low to com- 
pute "transition probabilities" for a comph:x con- 
straint 
Although accurate-but costly- methods to esti- 
mate comt)al;ibility values have been proposed in 
(Pelillo & Hetice 94), we will choose a simpler an(t 
much (:heaptw (:Olntmtationally solution: (JOHll)llt- 
ing the compatibility degree fl)r the manually writ- 
ten constraints using the number of occurr('nees 
of the consl;raint pattern in the training (:orIms to 
comtmte the prol)ability of the restricted word-tag 
pair given the contexl; defined by the constraint a 
II.elaxation doesn't need -as HMMs (h)- the prior 
prot)at)ility of a certain tag for a word, since it is 
not a constraint, but il; Call \])e llSCd to SOt; the 
initial st;at(; to a 11ot templet;ely rall(lol\[I OllC. hfi- 
tially we will assign to each word il;s most I)ro/)able 
tag, so we start optimization in a biassed point. 
Alternative Support l,%nctions 
The sut)port functions described in section 2 
are traditionally used in relaxation algorithnts, it 
seems better for our purt)ose to choose an addi- 
tive one, since the multiplicative flm(:tions might 
yiehl zero or tiny values when -as in Ollr cose- for ,q 
(:crtain val'iable or tag no constraints are available 
for a given subsel; of vm'ial)les. 
Since that fllnt:tions are general, we may try to 
lind ;~ suI)I)ort flmctkm more speciiic tbr our t)rol)- 
h:m. Sin(:e IIMMs lind the maxinmm sequ(:n(:e 
probat)ility and relaxation is a maximizing algo- 
rii;hm, we (:an make relaxation maximize th(,' se- 
(lllenc(? t)robability an(l we should gel; tile same 
results. To a(:hieve this we define a new Sul)port 
flmc, l;ion, which is the sequence i)robability: 
Being: 
t k tile tag for varial)h: 'vk with highest weight value 
a~ the current tilne step. 
7r(Vt, t 1) \[;he probal)ility for t~he sequence to sl;art 
in tag t I. 
P(v,t) the lexical probability for the word repre- 
se\]tted by v to have t;ag t. 
T(tl, I2) the probability of tag t2 given that I;he 
previous one is tl. 
~itj the set of all ternm'y constrainl;s on tag j for 
word i. 
II ,q • H... the :(:t of all hand-written constraints On (;ag 
3 k)r word i. 
We define: 
= × t})× 
N ! 
k--l.,k/i 
aThis is an issue that will require fitrtl,er ati:en- 
lion, since as constraints can be expressed in several 
degrees of g(merality, l;he estimated probabilities may 
vary greatly del)ending on how t;he constraint was 
expressed. 
879 
To obtain the new support function: 
(3.1) 
Compatibility Values 
Identifying compatibility values with transition 
probabilities may be good for n-gram models, but 
it is dubious whether it can be generalized to 
higher degree constraints. In addition we can 
question the appropriateness of using probability 
values to express compatibilities, and try to find 
another set of values that fits better our needs. 
We tried several candidates to represent com- 
patibility: Mutual Information, Association Ratio 
and Relative Entropy. 
This new compatibility measures are not lim- 
ited to \[0, 1\] as probabilities. Since relaxation up- 
dating functions (2.2) and (2.1) need support val- 
ues to be normalized, we must choose some func- 
tion to normalize compatibility values. 
Although the most intuitive and direct scal- 
ing would be the linear function, we will test as 
well some sigmoid-shaped hmctions widely used 
in neural networks and in signal theory to scale 
free-ranging values in a finite interval. 
All this possibilities together with all the pos- 
sibilities of the relaxation algorithm, give a large 
amount of combinations and each one of them is 
a possible tagging algorithm. 
4 Experiments 
To this extent, we have presented the relaxation 
labelling algorithm family, and stated soine con- 
siderations to apply them to POS tagging. 
In this section we will describe the experiments 
performed on applying this technique to our par- 
tieular problem. 
Our experiments will consist of tagging a corpus 
with all logical combinations of the following pa- 
rameters: Support function, Updating function, 
Compatibility values, Normalization function and 
Constraints degree, which can be binary, ternary, 
or hand-written constraints, we will experiment 
with any combination of them, as well as with 
a particular combination consisting of a back-off 
technique described below. 
In order to have a comparison reference we will 
evaluate the pertbrmance of two tuggers: A blind 
most-likely-tag tagger and a HMM tagger (Elwor- 
thy 93) performing Viterbi algorithm. The train- 
ing and test corpora will be the same for all tag- 
germ 
All results are given as precision percentages 
over ambiguous words. 
4.1 Results 
We performed the same experiments on three dif- 
ferent corpora: 
Corpus SN (Spanish Novel) train: 15Kw, test: 
2Kw, tag set size: 70. This corpus was 
chosen to test the algorithm in a language 
distinct than English, and because previous 
work (Moreno-Torres 94) on it provides us 
with a good test bench and with linguist writ- 
ten constraints. 
Corpus Sus (Susanne) train: 141Kw, test: 6Kw, 
tag set, size: 150. The interest of this corpus 
is to test the algorithm with a large tag set. 
Corpus WSJ (Wall Street Journal) 
train: 1055Kw, test: 6Kw, tag set size: 45 
The interest of this corpus is obviously its 
size, which gives a good statistical evidence 
for automatic constraints acquisition. 
Baseline results. 
Results obtained by the baseline tuggers are 
found in table 1. 
SN 
Most-likely 
\[MM 94.62% 
Sus WSJ 
86.01% 88.52% 
93.20% 93.63% 
Table 1: Results achieved by conventional tuggers. 
First; row of table 2 shows the best results ob- 
tained by relaxation when using only binary con- 
straints (B). That is, in the same conditions than 
HMM taggers. In this conditions, relaxation only 
performs better than HMM for the small corpus 
SN, and tile bigger the corpus is, tile worse results 
relaxation obtains. 
Adding hand-written constraints (C). 
Relaxation can deal with more constraints, so 
we added between 30 and 70 hand-written con- 
straints depending on the corpus. The constraints 
were derived ~malyzing the most frequent errors 
committed by tile HMM tagger, except for SN 
where we adapted the context constraints pro- 
posed by (Moreno-Torres 94). 
The constraints do not intend to be a general 
language model, they cover only some common er- 
ror cases. So, experiments with only hand-written 
constraints are not performed. 
The compatibility value for these constraints is 
coinputed from their occurrences in the corpus, 
and may be positive (compatible) or negative (in- 
compatible). 
Second row of table 2 shows the results obtained 
when using binary plus hand-written constraints. 
In all corpora results improve when adding 
hand-written constraints, except in WSJ. This 
is because the constraints used in this case are 
few (about 30) and only cover a few specific er- 
ror cases (mainly tile distinction past/participle 
following verbs to have or to be). 
Using trigram information (T). 
We have also available ternary constraints, ex- 
tracted from trigram occurrences. Results ob- 
880 
I__~_ S N 19"-5.77% 
 _cJ 96.54% 
Sus 91.65% WSJ 
~79.34V7/0 
92.50% 89.24% 
88.6ooof 
8-97~3 3 ~- 
89.83% ~.y8 0/~0, 
Table 2: Best relaxation results using every combina- 
tion of constraint kinds. 
tained using ternary constraints in combination 
with other kinds of information are shown in rows 
T, BT, TC and BTC in table 2. 
There seem to be two tendencies in this table: 
First, using trigrmns is only helpflfl in WSJ. 
This is becmme the training cortms for WSJ is 
much bigger than in the other cases, and so the tri- 
grmn model obtained is good, while, for the ()tiler 
c<)rpora, the training set; seems to t)e too small to 
provide a good trigram iniormation. 
Secondly, we can observe that there is a general 
tendency to "the more information, the better re- 
suits", that ix, when using BTC we get l)etter re- 
suits that with B~, which is in turn better than T 
alone. 
Stopping before eonve~yenee. 
All above results at'(; obtaine.d stopt)ing the re- 
laxation ;algorithm whim it reaches convergence 
(no significant cbmges are l)rodu(:ed fl'om one it- 
eration to the next), but relaxation algorithms not 
necessarily give their l)est results at convergence 4, 
or not always need to achieve convergence to know 
what the result will be (Zucker et al. 81). So they 
are often stoplmd after a few iterations. Actually, 
what we arc (loing is changing our convergen('e cri- 
terion to one more sophisticated than "sto 1) when 
dlere are no Inore changes". 
The results l)resented in table 3 are tit(; best 
overall results dmt we wouM obtain if we had a 
criterion which stopped tit(; iteration f)rocess when 
the result obtained was an optimum. The number 
in parenthesis is the iteration at, which the algo- 
rithm should be stopped. Finding such a criterion 
is ~ point that will require fllrther research. 
(12)\] 93.78% (6) 
Table 3: Best results stopping before conw.~rgence. 
4This is due to two main reasons: (1)2}t,('. optimum 
of tit(*, supI)ort function doesn't correspond ea;actly to 
the best solution for the problem, that is, the chosen 
flmction is only a,n approximation of the desired one. 
And (2) performing too much iterations can produce 
a more probable solution, which will not necessarily 
be the correct one. 
These results are clearly better than those ob- 
tained at; relaxation convergence, and they also 
outperform HMM taggers. 
Searching a more specific support flLnction. 
We have t)een using support fimctions that are 
traditionally used in relaxation, but we might try 
to st)ecialize relaxation labelling to POS tagging. 
Results obtained with this specific sut)t)ort fun(:- 
tion (3.1) are sumntarize.d in table 4 
SN Sus 
Table 4: Best results using a specific support fun<:- 
tkm. 
Using this new supt)ort fun(:tion we obtain re- 
suits slightly below those of the IIMM tagger, 
Our sut)i)ort fun(:tion is tim sequence 1)robal)il- 
ity, which is what Viterbi maxinfizes, 1)ut we get 
worse, results. Tlmrc are two main reasons for 
that. The first one is that relaxation does not 
maximize the sui)t)ort; flln('tion but the weigh, ted 
support for each variable, so we' are not doing 
exactly the same than a HMM tagger. Second 
reason is that relaxation is not an algorithm that 
finds global optima an(1 can be trapl)ed in local 
maxima. 
Combining information in a llack-off h, ierarchy. 
Wh can confl)ine bigram and ti'igranl infi'oma- 
tion in a. back-off mechanism: Use trigrams if 
available and bigrmns when not. 
Results o})tained with that technique at'(', shown 
in table 5 
Sus WSJ \[92.31% (3'-~)_ t 93.66% (4)t94.29% (4)\] 
Table 5: Best; results using ~* back-off' technique. 
The results he, re point to the same conclusions 
than the use of trigrams: il! we have a good trigrmn 
model (as in WSJ) then the back-off" technique 
is usefifl, and we get here the best overall result 
for tiffs corlms. If the trigram model ix not so 
good, results are not better than the obtained with 
l)igrams ahme. 
5 Application to Word Sense 
Disambiguation 
We can apply the same algorithm to the task of 
disambiguating tile sense of a word in a certain 
context. All we need is to state tile <',onslxaints 
between senses of neighbour words. We can coin- 
bine this task with POS tagging, since t, here~ are 
also constraints between the POS tag of a word 
attd its sense, or the sense of a neighbour word. 
881 
Preliminary experiments have been performed 
on SemCor (Miller et al. 93). The problem con- 
sists in assigning to each word its correct POS tag 
and the WordNet file code for its right sense. 
A most-likely algorithm got 62% (over nouns 
apperaring in WN). We obtained 78% correct, 
only adding a constraint stating that the sense 
chosen for a word must be compatible with its 
POS tag. 
Next steps should be adding more constraints 
(either hand written or automatically derived) on 
word senses to improve performance and tagging 
each word with its sense in WordNet instead of its 
file code. 
6 Conclusions 
We have applied relaxation labelling algorithm to 
the task of POS tagging. Results obtained show 
that the algorithm not only can equal markovian 
taggers, but also outperform them when given 
enough constraints or a good enough model. 
The main advantages of relaxation over Marko- 
vian taggers are the following: First of all, relax- 
ation can deal with more information (constraints 
of any degree), secondly, we can decide whether 
we want to use only automatically acquired con- 
straints, only linguist-written constraints, or any 
combination of both, and third, we can tune the 
model (,~dding or changing constraints or compat- 
ibility coefficients). 
We can state that in all experiments, the re- 
finement of the model with hand written con- 
straints led to an improvement in performance. 
We improved performance adding few constraints 
which were not linguistically motiwtted. Probably 
adding more "linguistic" constraints would yield 
more significant improvements. 
Several parametrizations for relaxation have 
been tested, and results seem to indicate that: 
• support function (1.2) produces clearly worse 
results than the others. Support flmction 
(1.1) is slightly ahead (1.3). 
• using mutual information as compatibility 
values gives better results. 
• waiting for convergence is not a good policy, 
and so alternative stopping criterions must be 
studied. 
• the back-off technique, as well as the trigram 
model, requires a really big training corpus. 
7 Future work 
The experiments reported and the conclusions 
stated in this paper seem to provide a solid back- 
ground for further work. We intend to follow sev- 
eral lines of research: 
• Applying relaxation to WSD and to WSD 
p!us POS-tagging. 
• Experiment with different stopt)ing criteri- 
ons. 
• Consider automatically extracted constraints 
(Mhrquez & Rodrlguez 95). 
• Investigate alternative ways to compute 
compatibility degrees for hand-written con- 
straints. 
• Study back-off techniques that take into ac- 
count all classes and degrees of constraints. 
• Experiment stochastic relaxation (Sinmlated 
annealing). 
• Compare with other optimization or con- 
straint satisfaction teehlfiques applied to 
NLP tasks. 
Acknowledgements 
I thank Horacio Rodfguez for his help, support 
and valuable comments on this paper. I also thank 
Kiku Ribas, German Rigau and Pedro Meseguer 
for their interesting suggestions. 
References 
Brill, E.; A simple rule-based part-of-speech tag- 
ger. ANLP 1992 
Cowie, J.; Guthrie, J.; Guthrie, L.; Lexical Disam- 
biguatio'n using Simulated Annealing DARPA 
Speech and Natural Language; Feb. 1992 
Elworthy, D.; Part of Speech and Phrasal Tagging. 
ESPRIT BRA-7315 Acquilex iI, Working Paper 
10, 1993 
Mhrquez, L.; Rodrfguez, H.; Towards Learning a 
Constraint Grammar from Annotated Cool, ova 
Using Decision Trees. ESPRIT BRA-7315 Ac- 
quilex II, Working Paper, 1995 
Miller, G.A.; Leacock, C.; Tengi, R.; Bunker, 
R.T.; A semantic concordance ARPA Wks on 
Human Language Technology, 1993 
Moreno-Torres, I.; A morphological disambigua~ 
tion tool (MDS). An application to Spanish. ES- 
PRIT BRA-7315 Acquilex II, Working Paper 
24, 1994 
Pelillo, M.; Refice M.; Learning Compatibility 
Coefficients for Relaxation Labeling Processes. 
IEEE Trans. on Patt. An. & Maeh. Int. 16, n. 
9 (1994) 
Schmid, It.; Part of Speech lhgging with Neural 
Networks COLING 1994 
Torras, C.; Relaxation and Neural Learning: 
Points of Convergence and Divergence. Jour- 
nal of Parallel and Distributed Computing 6, 
pp.217-244 (1989) 
Zucker, S.W.; Leclerc, Y.G.; Mohammed, J.L.; 
Continuous Relaxation and local maxima selec- 
tion: Conditions for equivalence. IEEE Trans. 
on Patt. An. &Mach. Int. 3, n. 2 (1981) 
882 
