Word Sense Disambiguation Using Pairwise Alignment
Koichi Yamashita Keiichi Yoshida
Faculty of Administration and Informatics
University of Hamamatsu
1230 Miyakoda-cho, Hamamatsu, Shizuoka, Japan
yamasita@hamamatsu-u.ac.jp
Yukihiro Itoh
Abstract
In this paper, we proposed a new super-
vised word sense disambiguation (WSD)
method based on a pairwise alignment
technique, which is used generally to mea-
sure a similarity between DNA sequences.
The new method obtained 2.8%-14.2%
improvements of the accuracy in our ex-
periment for WSD.
1 Introduction
WSD has been recognized as one of the most impor-
tant subjects in natural language processing, espe-
cially in machine translation, information retrieval,
and so on (Ide and V´eronis, 1998). Most of previ-
ous supervised methods can be classified into two
major ones; approach based on association, and ap-
proach based on selectional restriction. The former
uses some words around a target word, represented
by n-word window. The latter uses some syntactic
relations, say, verb-object, including necessarily a
target word.
However, there are some words that one approach
gets good result for them while another gets worse,
and vice versa. For example, suppose that we want
to distinguish between “go off or discharge” and
“terminate the employment” as a sense of “fire”.
Consider the sentence in Brown Corpus1:
My Cousin Simmons carried a musket, but he had
loaded it with bird shot, and as the officer came op-
posite him, he rose up behind the wall and fired.
1In this case, we consider only one sentential context for the
simplicity.
The words such as “musket”, “loaded” and “bird
shot” would seem useful in deciding the sense of
“fire”, and serve as clue to leading the sense to “go
off or discharge”. It seems that there is no clue to an-
other sense. For this case, an approach based on as-
sociation is useful for WSD. However, an approach
based on selectional restriction would not be appro-
priate, because these clues do not have the direct
syntactic dependencies on “fire”. On the other hand,
consider the sentence in EDR Corpus:
Police said Haga was immediately fired from the
force.
The most significant fact is that “Haga” (a person’s
name) appears as the direct object of “fire”. A selec-
tional restriction approach would use this clue ap-
propriately, because there is the direct dependency
between “fire” and “Haga”. However, an associa-
tion approach would make an error in deciding the
sense, because “Police” and “force” tend to be a
noise, from the point of view of an unordered set of
words. Generally, an association does not use a syn-
tactic dependency, and a selectional restriction uses
only a part of words appeared in a sentence.
In this paper, we present a new method for WSD,
which uses syntactic dependencies for a whole sen-
tence as a clue. They contain both of all words in-
cluded in a sentence and all syntactic dependencies
in it. Our method is based on a technique of pair-
wise alignment, and described in the following two
sections. Using our method, we have gotten appro-
priate sense for various cases including above exam-
ples. In section 4, we describe our experimental re-
sult for WSD on some verbs in SENSEVAL-1 (Kil-
garriff, 1998).
2 Our Method
Our method has the features on an association and a
selectional restriction approach both. It can be ap-
plied with the various sentence types because our
method can treat a local (direct) and a whole sen-
tence dependency. Our method is based on the fol-
lowing steps;
Step 1. Parse the input sentence with syntactic
parser2, and find all paths from root to leaves
in the resulting dependency tree.
Step 2. Compare the paths from Step 1. with proto-
type paths prepared for each sense of the target
word.
Step 3. Find a summation of similarity between
each prototype and input path for each sense.
Step 4. Select the sense with the maximum value of
the summation.
We describe our method in detail in the followings.
In our method, we consider paths from root to
leaves in a dependency tree. For example, consider
the sentence “we consider a path in a graph”. This
sentence has three leaves in the dependency struc-
ture, and consequently has three paths from root to
leaves; (consider, SUB, we), (consider, OBJ, path,
a) and (consider, OBJ, path, in, graph, a). “SUB”
and “OBJ” in the paths are the elements added au-
tomatically using some rules in order to make a re-
markable difference between verb-subject and verb-
object. We think this sequence structure of word
would serve as a clue to WSD very well, and we
regard a set of the sequences obtained from an input
sentence as the context of a target word.
The general intuition for WSD is that words
with similar context have the same sense (Charniak,
1993; Lin, 1997). That is, once we prepare the pro-
totype sequences for each sense, we can determine
the sense of the target word as one with the most
similar prototype set. We measure a similarity be-
tween a set of prototype sequences T and a set of
sequences from input sentence Ta0 . Let T and Ta0
have a set of sequences, PT a1a3a2 p1
a4
p2
a4a6a5a7a5a7a5a7a4
pn
a8
and
2We assume that we can get the correct syntactic structure
here. (See section 4)
fire: go off or discharge
fire, SUB, person
fire, OBJ, [weapon, rocket]
fire, [on, upon, at], physical object
fire, *, load, [into, with], weapon
fire, *, set up, OBJ, weapon
fire: terminate the employment
fire, SUB, company
fire, OBJ, [person, people, staff]
fire, from, organization
fire, *, hire
fire, *, job
Figure 1: Prototype sequence for verb “fire”
PT
a9
a1a10a2 p
a0
1a4
pa02
a4a6a5a7a5a7a5a7a4
pa0m
a8
respectively. pi and pa0j are se-
quences of words. We define the similarity between
T and Ta0 , sima11 T
a4
Ta0a13a12 , as following:
sima11 T
a4
Ta0 a12 a1 ∑
pi
a14
PT
fi max
pa9j
a14
PT
a9
alignmenta11 pi
a4
pa0ja12 (1)
sima11 T
a4
Ta0 a12 is not commutative. That is, sima11 T
a4
Ta0 a12a16a15a1
sima11 Ta0
a4
Ta12 . alignmenta11 pi
a4
pa0ja12 is an alignment score
between the sequences pi and pa0j, defined in the next
section. fi is a weight function characteristic of the
sequence pi, defined as following:
fi a1
a17 u
i if maxp
a9j
a14
PT
a9
alignmenta11 pi
a4
pa0ja12a19a18 ti
vi otherwise
(2)
where ui and vi are arbitrary constants and ti is arbi-
trary threshold.
Using equation (1), we can estimate a similarity
between the context of a target word and prototype
context, and can determine the sense of a target word
by selecting the prototype with the maximum simi-
larity.
An example of the prototype sequences for verb
“fire” is shown in Figure 1. A prototype sequence
is represented like a regular expression. For the
present, we obtain the sequence by hand. The basic
policy to obtain prototypes is to observe the common
features on dependency trees in which target word is
used in the same sense. We have some ideas about a
method to obtain prototypes automatically.
3 Pairwise Alignment
We attempt to apply the method of pairwise align-
ment to measuring the similarity between sequences.
Recently, the technique of pairwise alignment is
worked
at
composition
the
is make at home
1.000 0.500
1.000
0.595
-1
-1
-1
-1-1
-1
-1
-1
-1 -1 -1 -1
-1 -1
-1-1-1
-1
-1
-1 -1
-1-1 -1
-1 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
alignment : (worked)(      ) (at)(at) (composition)(           ) (the)(make)is home- -
score : 0.595
= (worked, at, composition, the)
= (is, make, at, home)
p
p’
Figure 2: Pairwise alignment
used generally in molecular biology research as
a basic method to measure the similarity between
proteins or DNA sequences (Mitaku and Kanehisa,
1995).
There have been several ways to find the pairwise
alignment, such as the method based on Dynamic
Programming, one based on Finite State Automa-
ton, and so on (Durbinet al., 1998). In our method,
we apply the method using DP matrix, as in Fig-
ure 2. We have shown the pairwise alignment be-
tween sequences p a1 (worked, at, composition, the)
and pa0 a1 (is, make, at, home) as an example.
In a matrix, a vertical and horizontal transition
means a gap and is assigned a gap score. A diag-
onal transition means a substitution and is assigned
a score based on the similarity between two words
corresponding to that point in the matrix. Actually,
the following value is calculated in each node, using
values which have been calculated in its three previ-
ous nodes.
Fi
a20 j
a1 max
a21
a22a23
Fi
a24 1a20 j a25
substa11 -
a4
wa0ja12
Fi
a20 ja24 1 a25
substa11 wi
a4
-a12
Fi
a24 1a20 ja24 1 a25
substa11 wi
a4
wa0ja12
(3)
where substa11 -
a4
wa0ja12 and substa11 wi
a4
-a12 represent re-
spectively to substitute wa0j and wi with a gap (-),
and return the gap score. substa11 wi
a4
wa0ja12 represent the
score of substituting wi with wa0j or vice versa.
Now let the word w has synsets s1
a4
s2
a4a6a5a7a5a7a5a26a4
sk and
wa0 has sa01
a4
sa02
a4a6a5a7a5a7a5a7a4
sa0l on WordNet hierarchy (Miller et
al., 1990). For simplicity, we define the substa11 w
a4
wa0 a12
as following, based on the semantic distance (Stetina
and Nagao, 1998).
substa11 w
a4
wa0 a12 a1 2
a5
max
ia20 j
a11 sda11 si
a4
sa0ja12a6a12a28a27 1 (4)
where sda11 si
a4
sa0ja12 is the semantic distance between
two synsets si and sa0j. Because 0 a29 sda11 si
a4
sa0ja12 a29 1,
a27 1
a29 substa11 w
a4
wa0 a12 a29 1. The score of the substitution
between identical words is 1, and one between two
words with no common ancestor in the hierarchy is
a27 1. We simply define the gap score as a27 1.
4 Experimental Result
Up to the present, we have obtained the experimental
results on 7 verbs in SENSEVAL-13. In our exper-
iment, for all sentences including target word in the
training and test corpus of SENSEVAL-1, we make
a parsing using Apple Pie Parser (Sekine, 1996)
and additional vertices using some rules automati-
cally. If the resulted parsing includes some errors,
we remove them by hand. Then we obtain the se-
quence patterns by hand from training data and at-
tempt WSD using equation (1) for test data. Because
of various length of sequence, we assign score zero
to the preceding and right-end gaps in an alignment.
We show our experimental results in Table 1. In
SENSEVAL-1, precisions and recalls are calculated
by three scoring ways, fine-grained, mixed-grained
and coarse-grained scoring. We show the results
only by fine-grained scoring which is evaluated by
distinguishing word sense in the strictest way. It
is impossible to make simple comparison with the
participants in SENSEVAL-1 because our method
needs supervised learning by hand. However, 2.8%-
14.2% improvements of the accuracy compared with
the best system seems significant, suggesting that
our method is promising for WSD.
5 Future Works
There are two major limitations in our method; one
of syntactic information and of knowledge acquisi-
3We have experimented on verbs in SENSEVAL-1 one by
one alphabetically. The word “amaze” is omitted because it has
only one verbal sense.
Table 1: Experimental results for some verbs (in
fine-grained scoring)
bet the numbers of test instance:117
precision (recall)
our method 0.880 (0.880)
best system in SENSEVAL-1 0.778 (0.778)
human 0.924 (0.916)
bother the numbers of test instance:209
precision (recall)
our method 0.900 (0.900)
best system in SENSEVAL-1 0.866 (0.866)
human 0.976 (0.976)
bury the numbers of test instance:201
precision (recall)
our method 0.667 (0.667)
best system in SENSEVAL-1 0.572 (0.572)
human 0.928 (0.923)
calculate the numbers of test instance:218
precision (recall)
our method 0.950 (0.950)
best system in SENSEVAL-1 0.922 (0.922)
human 0.954 (0.950)
consume the numbers of test instance:186
precision (recall)
our method 0.645 (0.645)
best system in SENSEVAL-1 0.503 (0.500)
human 0.944 (0.939)
derive the numbers of test instance:217
precision (recall)
our method 0.751 (0.751)
best system in SENSEVAL-1 0.664 (0.664)
human 0.965 (0.961)
float the numbers of test instance:229
precision (recall)
our method 0.616 (0.616)
best system in SENSEVAL-1 0.555 (0.555)
human 0.927 (0.923)
tion by hand.
The former is that our method assumes we can get
the correct syntactic information. In fact, the accu-
racy and performance of syntactic analyzer are being
improved more and more, consequently this disad-
vantage would become a minor problem. Because a
similarity between sequences derived from syntactic
dependencies is calculated as a numerical value, our
method would also be suitable for integration with a
probabilistic syntactic analyzer.
The latter, which is more serious, is that the se-
quence patterns used as clue to WSD are acquired
by hand at the present. In molecular biology re-
search, several attempts to obtain sequence patterns
automatically have been reported, which can be ex-
pected to motivate ours for WSD. We plan to con-
struct an algorithm for an automatic pattern acquisi-
tion from large scale corpora based on those biolog-
ical approaches.
References
Eugene Charniak. 1993. Statistical Language Learning.
MIT Press, Cambridge.
Richard Durbin, Sean R. Eddy, Andrew Krogh and
Graeme Mitchison. 1998. Biological Sequence Anal-
ysis: Probabilistic Models of Proteins and Nucleic
Acids. Cambridge University Press.
Marti A. Hearst. 1991. Noun Homograph Disambigua-
tion Using Local Context in Large Text Corpora. In
Proceedings of the 7th Annual Conference of the Uni-
versity of Waterloo Center for the New OED and Text
Research, pp.1-22.
Nancy Ide and Jean V´eronis. 1998. Introduction to the
Special Issue on Word Sense Disambiguation: The
State of the Art. Computational Linguistics, 24(1):1-
40.
Adam Kilgarriff. 1998. Senseval: An exercise in eval-
uating word sense disambiguation programs. In Pro-
ceedings of the 1st International Conference on Lan-
guage Resources and Evaluation (LREC98), volume 1,
pp.581-585.
Dekang Lin. 1997. Using Syntactic Dependency as Lo-
cal Context to Resolve Word Sense Ambiguity. In
Proceedings of ACL/EACL-97, pp.64-71.
Christopher D. Manning and Hinrich Sch¨utze. 1999.
Foundations of Statistical Natural Language Process-
ing. MIT Press, Cambridge.
George A. Miller, Richard Beckwith, Christiane Fell-
baum, Derek Gross and Katherine J. Miller. 1990. In-
troduction to WordNet: an on-line lexical database. In
International Journal of Lexicography, 3(4):235-244.
Shigeki Mitaku and Minoru Kanehisa (ed). 1995. Hu-
man Genom Project and Knowledge Information Pro-
cessing (in Japanese). Baifukan.
Satoshi Sekine. 1996. Manual of Apple Pie Parser.
URL: http://nlp.cs.nyu.edu/app/.
Jiri Stetina and Makoto Nagao. 1998. General Word
Sense Disambiguation Method Based on a Full Sen-
tential Context. In Journal of Natural Language Pro-
cessing, 5(2):47-74.
