WSD and Closed Semantic Constraint
Jiangsheng Yu∗
Institute of Computational Linguistics
Peking University, Beijing, China, 100871
Abstract The application-driven construction
of lexicon has been emphasized as a methodology
of Computational Lexicology recently. We focus on
the closed semantic constraint of the argument(s)
of any verb concept by the noun concepts in a
WordNet-like lexicon, which theoretically is related
to Word Sense Disambiguation (WSD) at differ-
ent levels. From the viewpoint of Dynamic Lexi-
con, WSD provides a way of automatic construc-
tion for the closed semantic constraints and also
benefits from the semantic descriptions.
Keywords dynamic lexicon, evolution, WSD,
WordNet-like lexicon, closed semantic constraint
1 Introduction
As the underlying resource of semantic analysis,
the most important descriptions in a semantic lexi-
con are the relationships between verbs and nouns,
which usually comes down to the closed semantic
constraint of each argument of any verb. Once the
structure of the semantic lexicon is determined,
the closed semantic constraint becomes a well-
defined problem.
Example 1.1 The verb dˇa has many mean-
ings in Chinese, which differ in dˇa h´aizi (pun-
ish the child), dˇa m´aoy¯i (weaver the sweater),
dˇa ji`angy´ou (buy the soy), etc. Actually, the se-
mantics of dˇa is distinguished by the semantics of
its arguments.
Different from the traditional lexicon, we advo-
cate the conception of dynamic lexicon ([15]) and
its evolution oriented to some particular applica-
tion, which will be mentioned in Section 2. The
WordNet-like lexicon is treated as a dynamic one,
which means that the structures representing se-
mantic knowledge could be changed according to
some empirical standards. In the next section,
we’ll define the WSD based on the WordNet-like
lexicon, and then discuss the training of concept
TagSet and the statistical model of WSD. By the
statistical WSD, in Section 4, we introduce an ap-
proach to the automatic construction of the closed
∗This paper is supported by National Foundation of Nat-
ural Science (Research on Chinese Information Extraction),
No. 69483003 and Project 985 in Peking University.
semantic constraints in a WordNet-like lexicon.
The last section is the conclusion.
2 Dynamic Lexicon and Its
Structural Evolution
Definition 2.1 A dynamic lexicon is a triple
Lex = 〈S,R,T〉 in which
1. S is a well-structured set with a type1 t,
2. R is the set of deductive rules on S, and
3. T is the set of all structural transformations
of S, keeping the type t.
Definition 2.2 Lexicon 〈Sprime,R,T〉 is called the
evolution result of the lexicon 〈S,R,T〉 if ∃t ∈ T∗
such that S t→ Sprime (or briefly S squiggleright Sprime). The pro-
cess of a dynamic lexicon to its evolution result is
called an evolution. Obviously, T is a group with
the operation of composition.
Definition 2.3 〈S,R,T〉 is called simple struc-
tured if T is a commutative group, otherwise com-
plex structured.
The more complex is the structure of S, the more
difficult are the applications of R and T. Since
some part of semantic knowledge is represented
by the structure, the complexity balance between
the structure and R (or T) is one of the serious
problems in Computational Lexicology.
Definition 2.4 Let Ω(S) denote the least number
of operations constructing S, and Ω(S squigglerightSprime) the
least number of operations from S to Sprime. It’s easy
to verify that
Theorem 2.1 Ω(·) is a distance, i.e., it satisfies
that ∀S,Sprime,Sprimeprime,
1. Ω(SsquigglerightSprime) ≥ 0
2. Ω(SsquigglerightSprime) = 0 ⇔ S = Sprime
3. Ω(SsquigglerightSprime) = Ω(SprimesquigglerightS)
4. Ω(SsquigglerightSprimeprime) ≤ Ω(SsquigglerightSprime)+Ω(SprimesquigglerightSprimeprime)
1For instance, labeled tree or complete lattice.
Corollary 2.1 Ω(Sprime) ≤ Ω(S)+Ω(SsquigglerightSprime)
Definition 2.5 The degree of structural destruc-
tion from S to Sprime, is defined by
ρ(SsquigglerightSprime) = 1− Ω(S
prime)
Ω(S)+Ω(SsquigglerightSprime) (1)
Property 2.1 0 ≤ ρ(SsquigglerightSprime) ≤ 1
Definition 2.6 Let S squiggleright S1 squiggleright ··· squiggleright Sn squiggleright ···
be a sequence of evolution, the sequence is called
convergent if there exists a constant A s.t. 0 ≤
A ≤ 1 and limn→∞ρ(SsquigglerightSn) = A.
It’s easy to see that a local evolution of the lex-
icon may not be an optimization even for a spe-
cific application. The index ρ indicates the con-
vergence of lexical structure, guaranteeing a stable
machine learning of the dynamic lexicon. Actually,
the structure of the so-called common knowledge
is nothing but a statistical distribution, which is
effected by the cultures and personal experiences.
Oriented to a particular application, such as IE,
IR, MT, etc, the appropriate semantic descriptions
in a WordNet-like lexicon seem necessary.
Example 2.1 C = {earthquake, quake, tem-
blor, seism} is not only a kind of Cprime =
{geological phenomenon}, but also a kind of Cprimeprime =
{natural disaster}.
3 WSD based on WordNet-
like Lexicon
What does it mean that a machine could under-
stand a given sentence S or a text T? As we
know, Turing Test of NLU includes at least the
meaning of any word w in S or T. Thus, the pre-
requisite WSD is to tag the semantic information
of w automatically. WordNet2 in Princeton Uni-
versity, in despite of its disputed quality, provides
an approach to the formalization of concepts in
natural language, in which a concept is defined by
a synonym set (SynSet). A more important work
in WordNet is the construction of a well-structured
concept network based on the hypernymy relation
(the main framework) and other accessorial rela-
tions, such as, the opposite relation, the holonymy
relation, entailment, cause, etc.
Definition 3.1 A WordNet-like lexicon is a dy-
namic lexicon with the type of WordNet:
1. restricted to each category, S is a labeled tree
from the viewpoint of the hypernymy relation
for both noun concepts and verb concepts,
2The specification of WordNet could be found in [3], [4],
[5], [9], [10], [11], etc.
2. some accessorial relations between the noun
(or verb) concepts, and
3. closed semantic constraint of the argument(s)
of each verb concept from the noun concepts.
The WordNet-like lexicon is complex structured, it
may not have the same ontology of WordNet, nei-
ther the semantic knowledge representations. But
the description method seems a general format for
all languages from the fact of EuroWordNet (see
[12]), Chinese Concept Dictionary (CCD, see [7],
[13], [14] and [15]), Korean WordNet, Tamil Word-
Net, etc.
Definition 3.2 Let Σ be the set of all words,
then Γ, the set of all concepts (or SynSets) in a
WordNet-like lexicon, is a subset of 2Σ. The set
of all SynSets containing w is denoted by ∆(w), in
which each element is called a sense of w.
Definition 3.3 Given a well-defined sentence
S = w1w2 ···wn, WSD is the computable pro-
cessing which tags wi a unique sense si =
{wi,wi1,··· ,wik} such that each derived combi-
natorial path is a well-defined sentence with the
semantics of S. The Principle of Substitution pro-
vides a corpus-based empirical approach to test
a SynSet well-defined or not. The SynSet is the
smallest unit in a WordNet-like lexicon, which is
the underlying of the structural descriptions be-
tween the concepts.
The training of concept TagSet and the statistical
model of WSD are interactional, which is the main
idea of our approach to WSD based on a WordNet-
like lexicon.
3.1 The Training of TagSet
The traditional semantic tags are from some on-
tology, the apriority of which is often criticized
by computational linguists. For us, the empiri-
cal method must impenetrate each step of WSD
because of the complexity of language knowledge.
The statistical approach to WSD needs a well
concept-tagged corpus as the training set for the
concept TagSet and the statistical data in the Hid-
den Markov Model (HMM). To avoid the sparse
data problem, only a few real subsets of Γ could
act as the TagSet in the statistical model (see [15]
and [16]). The first step leads to a set of structured
TagSets {T1,T2,··· ,Tm}, then the second step is
to choose the most efficient one which makes the
best accuracy of the statistical concept tagging.
Different from those unframed tags, the deductive
rule along the hypernymy trees works out the sense
of w by the following property:
Property 3.1 Suppose that the TagSet is T =
{C1,C2,··· ,Ck}, and the word w in a given sen-
tence is tagged by Ci, then the sense of w here
is the SynSet C which satisfies that Ci precedesequal C and
w ∈ C, whereprecedesequalis the partial ordering of the nodes
in the hypernymy tree.
3.2 Statistical Model of WSD
In some sense, WSD is the kernel problem of both
NLU and NLP ([1], [6], [8]). POS and concept tag
are two random variables in the HMM of WSD.
Sometimes POS of w determines its sense, some-
times not. But in most cases, a sense of w implies
a unique POS. The distribution of w’s senses with
the POS, P, is important in the (POS,concept)-
tagging. A Hidden Markov Model with two pa-
rameters will be adopted as the main statistical
model for WSD, and the Statistical Decision The-
ory and Bayesian Analysis, which are good at an-
alyzing the small samples, conducted as a com-
parison. The training corpus, T, is done by hand,
where the cursor sensitive display of the senses pro-
vides the help information.
Definition 3.4 Consider the well-defined sen-
tence S = w1w2 ···wn. By the lexicon, let
S = w1/P(i)
1
w2/P(i)
2
···wn/P(i)
n
be a possible POS
tagged result, where i ∈ I. Define
f(i) = argmax
j∈J
P(C(i,j)1 ···C(i,j)n |P(i)1 ···P(i)n )
= argmax
j∈J
P(C(i,j)1 ···C(i,j)n ,P(i)1 ···P(i)n )
= argmax
j∈J
P(C(i,j)1 ···C(i,j)n )
(2)
The HMM of concept can simulate the HMM with
two parameters of (POS,concept). f(i) in (2) is
predigested to
f(i) = argmax
j∈J
P(C(i,j)1 )
nproductdisplay
k=2
P(C(i,j)k |C(i,j)k−1) (3)
Property 3.2 There exists a unique map g from
the set of {P(i)1 P(i)2 ···P(i)n |i ∈ I} to the set of
{C(i,j)1 C(i,j)2 ···C(i,j)n |(i,j) ∈ I×J}, which satisfies
that
g(P(i)1 ···P(i)n ) = C(i,f(i))1 ···C(i,f(i))n (4)
where ∀i,k,∃C ∈ ∆(wk) s.t. C(i,f(i))k precedesequal C. If
there is Cprime negationslash= C satisfying Cprime ∈ ∆(wk) and
C(i,f(i))k precedesequal Cprime, then the one with more distribu-
tion is the selected sense of wk.
Property 3.3 Let s = w1w2 ···wn be any pos-
sible segmented sequence of S, corresponding
a set of probabilities of POS sequences As =
{P(P(i)1 P(i)2 ···P(i)n )|i ∈ I}. Each P(i)1 P(i)2 ···P(i)n
corresponds a set of probabilities of concept se-
quences B(i)s = {P(C(i,j)1 C(i,j)2 ···C(i,j)n )|j ∈ J},
where C(i,j)k has the POS of P(i)k , then
argmax
s
(a·maxs (As)+b·maxi,s (B(i)s )) (5)
is the choice of segmentation, where a > 0,b > 0
and a+b = 1. More precisely, (5) is rewritten by
argmax
s
{maxi {a·P(P(i)s )+b·P(g(P(i)s ))}} (6)
where P(i)s = P(i)1 P(i)2 ···P(i)n .
4 WSD driven Closed Seman-
tic Constraint
From the corpus and the statistical WSD, we can
make an induction of the arguments along the hy-
pernymy tree, which leads to the closed semantic
constraints automatically. At the same time, the
closed semantic constraints also provide a possible
approach to the empirical optimization of ΓN and
ΓV . While the total optimization of a WordNet-
like lexicon is still an open problem.
4.1 Similarity between Concepts
Definition 4.1 A labeled tree is a 5-tuple T =
〈N,Q,D,P,L〉 satisfying that:
1. N is a finite set of nodes
2. Q is a finite set of labels
3. D is a partial ordering on N, called domi-
nance relation
4. P is a strict partial ordering on N, called
precedence relation
5. (∃x ∈ N)(∀y ∈ N)[(x,y) ∈ D]
6. (∀x,y ∈ N)[[(x,y) ∈ P ∨ (y,x) ∈ P] ↔
[(x,y) /∈ D ∧(y,x) /∈ D]]
7. (∀x,y,z,w ∈ N)[[(w,x) ∈ P ∧ (w,y) ∈ D ∧
(x,z) ∈ D] → (y,z) ∈ P]
8. L : N → Q is a label map
Definition 4.2 A hypernymy tree is a labeled
tree, in which the label map is one-to-one. Al-
ways, we presume that the hypernymy tree is not
degenerative.
In a hypernymy tree of a WordNet-like lexi-
con, a node is a code and a label is a SynSet.
Since the label map is injective, without gen-
erality, a SynSet is usually denoted by a node.
We assume that the precedence relation between
the brother nodes always implies an ordering of
time, usage, frequency, mood, etc. For instance,
{spring,springtime} ≺ {summer,summertime} ≺
{fall,autumn} ≺ {winter,wintertime} as the hy-
ponyms of {season,time of year}.
Definition 4.3 Let f,b and B denote father, the
nearest younger-brother and the nearest elder-
brother respectively, satisfying that f = fb,f =
fB and Bb = bB = 1.
Definition 4.4 ∀x,y ∈ N, let z ∈ N be their
nearest ancestor satisfying z = fm(x) and z =
fn(y),D(x,y) def= m + n. k ∈ N is called the
offset of x from its eldest brother if ∃Bk(x) and
notexistentialBk+1(x). Let the offset of y is l, the similarity
between x and y is:
• If mn = 1,S(x,y) def= 〈0,|k −l|〉
• If mn negationslash= 1,S(x,y) def= 〈m+n,0〉
Definition 4.5 Suppose that S(x1,y1) = 〈a1,b1〉
and S(x2,y2) = 〈a2,b2〉, the comparison of simi-
larities is defined as follows:
1. a1 = a2
• S(x1,y1) precedesequal S(x2,y2) ↔ b1 ≤ b2
2. a1 negationslash= a2
• If a1 < a2, then S(x1,y1) ≺ S(x2,y2)
• If a1 > a2, then S(x2,y2) ≺ S(x1,y1)
Theorem 4.1 〈{S(x,y)|x,y ∈ N},precedesequal〉 is a totally
ordered set.
The elementary structural transformations in a
WordNet-like lexicon include:
1. insert a non-root brother-node;
2. collapse a non-root node to its father-node;
3. root is adding a new root;
4. add a link between two labeled trees;
5. delete a link between two labeled trees.
4.2 Induction of Constraints
ΓN (or ΓV ) denotes the set of noun (or verb) con-
cepts. Let C ∈ ΓV be a verb concept with one
argument. Suppose that we have gotten the ini-
tial closed semantic constraint of its argument,
Cprime ∈ ΓN, from a concept-tagged sentence. A
link from Cprime to C is added between ΓN and ΓV .
If Cprimeprime from another sentence is also a close se-
mantic constraint of C’s argument, then the in-
fimum of Cprime and Cprimeprime,inf(Cprime,Cprimeprime), is the new Cprime.
∀x ∈ Γ,Cprime precedesequal x, if the substitution from C to x still
induces well-formed sentences, then the induction
succeeds. Otherwise, the disjointed union Cprime⊕Cprimeprime
is the closed semantic constraint.
Definition 4.6 The induction of the closed
semantic constraints of C,D ∈ Γ is defined by
CintersectionsqD =



inf(C,D) if ∀x[inf(C,D) precedesequal x]
succeeds in the substitution
C ⊕D otherwise
Definition 4.7 By Theorem 4.1, the induction
between C ⊕D and E ∈ Γ is defined by
(C ⊕D) intersectionsqE =
braceleftbigg (C intersectionsqE) ⊕D if S(C,E) precedesequal S(D,E)
C ⊕ (D intersectionsqE) otherwise
Theoretically, if C1⊕C2⊕···⊕Cn is the closed se-
mantic constraint of the argument of C ∈ ΓV , then
∀i,∀x[Ci precedesequal x] succeeds in the substitution. Thus,
in the WordNet-like lexicon, there are n links from
ΓN to ΓV for C, where n is called the length of the
constraint. The approach to the closed semantic
constraints of the verb concepts with two argu-
ments is similar.
4.3 Clustering of Constraints
Definition 4.8 Suppose that there are N argu-
ments for all verb concepts and the length of the
i-th constraint is li, then ¯l =
Nsummationtext
i=1
li/N is called the
average length of the constraints.
¯l indicates the rationality of the concept classifi-
cation in a WordNet-like lexicon, which also acts
as an index of the evolution. Our presupposition
is that the optimization of the lexicon must have
the least average length of the constraints. The
clustering of noun concepts constrained by the ar-
guments of the verb concepts should be a standard
of the classification of ΓN.
Definition 4.9 S squiggleright S1 squiggleright ··· squiggleright Sn squiggleright ··· is
Cauchy sequence of evolution iff ∀epsilon1 > 0,∃N ∈
N,∀i,j > N,ρ(SisquigglerightSj) < epsilon1.
Theorem 4.2 The Cauchy sequence of evolution
is convergent. And ∀epsilon1 > 0,∃i,j ∈ N s.t. |¯l(Si) −
¯l(Sj)| < epsilon1.
ΓN is structured by not only the hypernymy rela-
tion but also the closed semantic constraints. Of
course, the hypernymy relation in ΓN is princi-
pal, but not necessarily unique. As described in
Example 2.1, the distinct angles of view provide
enough space for the evolution. By the hypernymy
relation in ΓV , we have
Property 4.1 ∀C,Cprime ∈ ΓV ,C precedesequal Cprime, if the closed
semantic constraint of Cprime is C1 ⊕ C2 ⊕ ··· ⊕ Cn,
then ∃Cn+1,··· ,Cm ∈ ΓN such that (((C1 ⊕C2 ⊕
···⊕Cn)intersectionsqCn+1)intersectionsq···intersectionsqCm) is the closed semantic
constraint of C.
This property provides an approach to the empiri-
cal testing of the concept classification of ΓV if ΓN
is fixed. Separately, ΓN (or ΓV ) can be evaluated
by some indexes and evolves to a satisfiable result.
A little more complicated, the closed semantic con-
straints destroy the independent evolution of ΓN
and ΓV . If ΓV is fixed, then the optimization of
ΓN may be implemented (but not completely re-
liable) and vice versa. While it is still an open
problem to define a numerical measure that could
formalize the optimization of the total structures
in a WordNet-like lexicon, especially ΓN and ΓV .
5 Conclusion
A scheme of the closed semantic constraint in a
WordNet-like lexicon based on WSD has been de-
scribed as an application driven construction of
a dynamic lexicon. At the same time, the fur-
ther topic leads to how the rule-based concept
tagging benefits from the descriptions of seman-
tic constraint. The empirical method is much em-
phasized in the WSD and the development of the
dynamic lexicon, such as the TagSet training, the
SynSet testing and the evolution of a WordNet-like
lexicon (see [15], [16] and [17]). The author be-
lieves that the computable part in Computational
Lexicology is nothing but the evolution of the dy-
namic lexicon oriented to a particular application,
which is actually the optimization of the language
knowledge base.
Acknowledgement
I appreciate all my colleagues participating in the
CCD project, the blithesome collaboration with
them is always memorable for me. Many thanks to
my friends in the Second and the Third Workshop
on Chinese Lexical Semantics for their kindly dis-
cussion with the author. Lastly, the most thankful
words are given to my wife for her longtime toler-
ance to my weaselling from the housework under
the false pretense of research.

References
[1] ALPAC 1966 Language and Machine: Com-
puters in Translation and Linguistics, Na-
tional Research Council Automatic Language
Processing Advisory Committee, Washing-
ton, D.C.
[2] Aristotle 1941 Categoriae, in The Basic
Works of Aristotle, R. McKeon (ed). Random
House, New York.
[3] Beckwith R. 1998 Design and Implementation
of the WordNet Lexical Database and Search-
ing Software, in [5], pp105-127.
[4] Fellbaum C. 1998 A Semantic Net of English
Verbs, in [5], pp69-104.
[5] Fellbaum C. (ed) 1999 WordNet: An Elec-
tronic Lexical Database, The MIT Press.
[6] Ide N. and V´eronis J. 1998 Introduction to
Special Issue on Word Sense Disambiguation:
The State of Art, Computational Linguistics,
Vol. 24, No. 1, pp1-40.
[7] Liu Y, Yu S.W. and Yu J.S. Building a Bilin-
gual WordNet: New Approaches and algo-
rithms, accepted by COLING2002.
[8] Manning C.D. and Sch¨utze H. 1999, Founda-
tions of Statistical Natural Language Process-
ing, The MIT Press.
[9] Miller G.A. et al 1993 Introduction to Word-
Net: An On-line Lexical Database, in the at-
tached specification of WordNet 1.6.
[10] Miller G.A. 1998 Nouns in WordNet, in [5],
pp23-46.
[11] Priss U. 1999 The Formalization of WordNet
by Methods of Relational Concept Analysis, in
[5], pp179-196.
[12] Vossen P. (ed.) 1998 EuroWordNet: A Multi-
linugual Database with Lexical Semantic Net-
works. Dordrecht: Kluwer.
[13] Yu J.S. and Yu S.W. et al 2001 Introduc-
tion to Chinese Concept Dictionary, in Inter-
national Conference on Chinese Computing
(ICCC2001), pp361-367.
[14] Yu J.S. 2001 The Structure of Chinese Con-
cept Dictionary, accepted by Journal of Chi-
nese Information Processing, 2001.
[15] Yu J.S. 2001 Evolution of WordNet-like Lexi-
con, in The First International Conference of
Global WordNet, Mysore, India, 2002.
[16] Yu J.S. and Yu S.W. 2002 Word Sense Dis-
ambiguation based on Integrated Language
Knowledge Base, in The 2nd International
Conference on East-Asian Language Pro-
cessing and Internet Information Technology
(EALPIIT’2002).
[17] Yu J.S. 2002 Statistical Methods in Word
Sense Disambiguation, draft (can be down-
loaded from http://icl.pku.edu.cn/yujs/) of
seminar at the Institute of Computational
Linguistics, Peking Univ..
