File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1816_metho.xml

Size: 15,620 bytes

Last Modified: 2025-10-06 14:08:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1816">
  <Title>WSD and Closed Semantic Constraint</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Dynamic Lexicon and Its
</SectionTitle>
    <Paragraph position="0"> 1. S is a well-structured set with a type1 t, 2. R is the set of deductive rules on S, and 3. T is the set of all structural transformations of S, keeping the type t.</Paragraph>
    <Paragraph position="1">  Definition 2.2 Lexicon &lt;Sprime,R,T&gt; is called the evolution result of the lexicon &lt;S,R,T&gt; if [?]t [?] T[?] such that S t- Sprime (or briefly S squiggleright Sprime). The process of a dynamic lexicon to its evolution result is called an evolution. Obviously, T is a group with the operation of composition.</Paragraph>
    <Paragraph position="2"> Definition 2.3 &lt;S,R,T&gt; is called simple structured if T is a commutative group, otherwise complex structured.</Paragraph>
    <Paragraph position="3"> The more complex is the structure of S, the more difficult are the applications of R and T. Since some part of semantic knowledge is represented by the structure, the complexity balance between the structure and R (or T) is one of the serious problems in Computational Lexicology.</Paragraph>
    <Paragraph position="4"> Definition 2.4 Let Ohm(S) denote the least number of operations constructing S, and Ohm(S squigglerightSprime) the least number of operations from S to Sprime. It's easy to verify that Theorem 2.1 Ohm(*) is a distance, i.e., it satisfies  that [?]S,Sprime,Sprimeprime, 1. Ohm(SsquigglerightSprime) [?] 0 2. Ohm(SsquigglerightSprime) = 0 = S = Sprime 3. Ohm(SsquigglerightSprime) = Ohm(SprimesquigglerightS) 4. Ohm(SsquigglerightSprimeprime) [?] Ohm(SsquigglerightSprime)+Ohm(SprimesquigglerightSprimeprime)  Corollary 2.1 Ohm(Sprime) [?] Ohm(S)+Ohm(SsquigglerightSprime) Definition 2.5 The degree of structural destruction from S to Sprime, is defined by</Paragraph>
    <Paragraph position="6"> Definition 2.6 Let S squiggleright S1 squiggleright *** squiggleright Sn squiggleright *** be a sequence of evolution, the sequence is called convergent if there exists a constant A s.t. 0 [?] A [?] 1 and limn-[?]r(SsquigglerightSn) = A.</Paragraph>
    <Paragraph position="7"> It's easy to see that a local evolution of the lexicon may not be an optimization even for a specific application. The index r indicates the convergence of lexical structure, guaranteeing a stable machine learning of the dynamic lexicon. Actually, the structure of the so-called common knowledge is nothing but a statistical distribution, which is effected by the cultures and personal experiences.</Paragraph>
    <Paragraph position="8"> Oriented to a particular application, such as IE, IR, MT, etc, the appropriate semantic descriptions in a WordNet-like lexicon seem necessary.</Paragraph>
    <Paragraph position="9"> Example 2.1 C = {earthquake, quake, temblor, seism} is not only a kind of Cprime = {geological phenomenon}, but also a kind of Cprimeprime = {natural disaster}.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 WSD based on WordNet-
</SectionTitle>
    <Paragraph position="0"> like Lexicon What does it mean that a machine could understand a given sentence S or a text T? As we know, Turing Test of NLU includes at least the meaning of any word w in S or T. Thus, the pre-requisite WSD is to tag the semantic information of w automatically. WordNet2 in Princeton University, in despite of its disputed quality, provides an approach to the formalization of concepts in natural language, in which a concept is defined by a synonym set (SynSet). A more important work in WordNet is the construction of a well-structured concept network based on the hypernymy relation (the main framework) and other accessorial relations, such as, the opposite relation, the holonymy  relation, entailment, cause, etc.</Paragraph>
    <Paragraph position="1"> Definition 3.1 A WordNet-like lexicon is a dynamic lexicon with the type of WordNet:  1. restricted to each category, S is a labeled tree from the viewpoint of the hypernymy relation for both noun concepts and verb concepts, 2The specification of WordNet could be found in [3], [4], [5], [9], [10], [11], etc.</Paragraph>
    <Paragraph position="2"> 2. some accessorial relations between the noun (or verb) concepts, and 3. closed semantic constraint of the argument(s)  of each verb concept from the noun concepts. The WordNet-like lexicon is complex structured, it may not have the same ontology of WordNet, neither the semantic knowledge representations. But the description method seems a general format for all languages from the fact of EuroWordNet (see [12]), Chinese Concept Dictionary (CCD, see [7], [13], [14] and [15]), Korean WordNet, Tamil Word-Net, etc.</Paragraph>
    <Paragraph position="3"> Definition 3.2 Let S be the set of all words, then G, the set of all concepts (or SynSets) in a WordNet-like lexicon, is a subset of 2S. The set of all SynSets containing w is denoted by [?](w), in which each element is called a sense of w.</Paragraph>
    <Paragraph position="4"> Definition 3.3 Given a well-defined sentence S = w1w2 ***wn, WSD is the computable processing which tags wi a unique sense si = {wi,wi1,*** ,wik} such that each derived combinatorial path is a well-defined sentence with the semantics of S. The Principle of Substitution provides a corpus-based empirical approach to test a SynSet well-defined or not. The SynSet is the smallest unit in a WordNet-like lexicon, which is the underlying of the structural descriptions between the concepts.</Paragraph>
    <Paragraph position="5"> The training of concept TagSet and the statistical model of WSD are interactional, which is the main idea of our approach to WSD based on a WordNet-like lexicon.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 The Training of TagSet
</SectionTitle>
      <Paragraph position="0"> The traditional semantic tags are from some ontology, the apriority of which is often criticized by computational linguists. For us, the empirical method must impenetrate each step of WSD because of the complexity of language knowledge.</Paragraph>
      <Paragraph position="1"> The statistical approach to WSD needs a well concept-tagged corpus as the training set for the concept TagSet and the statistical data in the Hidden Markov Model (HMM). To avoid the sparse data problem, only a few real subsets of G could act as the TagSet in the statistical model (see [15] and [16]). The first step leads to a set of structured TagSets {T1,T2,*** ,Tm}, then the second step is to choose the most efficient one which makes the best accuracy of the statistical concept tagging.</Paragraph>
      <Paragraph position="2"> Different from those unframed tags, the deductive rule along the hypernymy trees works out the sense of w by the following property: Property 3.1 Suppose that the TagSet is T = {C1,C2,*** ,Ck}, and the word w in a given sentence is tagged by Ci, then the sense of w here is the SynSet C which satisfies that Ci precedesequal C and w [?] C, whereprecedesequalis the partial ordering of the nodes in the hypernymy tree.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Statistical Model of WSD
</SectionTitle>
      <Paragraph position="0"> In some sense, WSD is the kernel problem of both NLU and NLP ([1], [6], [8]). POS and concept tag are two random variables in the HMM of WSD.</Paragraph>
      <Paragraph position="1"> Sometimes POS of w determines its sense, sometimes not. But in most cases, a sense of w implies a unique POS. The distribution of w's senses with the POS, P, is important in the (POS,concept)tagging. A Hidden Markov Model with two parameters will be adopted as the main statistical model for WSD, and the Statistical Decision Theory and Bayesian Analysis, which are good at analyzing the small samples, conducted as a comparison. The training corpus, T, is done by hand, where the cursor sensitive display of the senses provides the help information.</Paragraph>
      <Paragraph position="3"> be a possible POS tagged result, where i [?] I. Define</Paragraph>
      <Paragraph position="5"> (2) The HMM of concept can simulate the HMM with two parameters of (POS,concept). f(i) in (2) is predigested to</Paragraph>
      <Paragraph position="7"> Property 3.2 There exists a unique map g from the set of {P(i)1 P(i)2 ***P(i)n |i [?] I} to the set of</Paragraph>
      <Paragraph position="9"> where [?]i,k,[?]C [?] [?](wk) s.t. C(i,f(i))k precedesequal C. If there is Cprime negationslash= C satisfying Cprime [?] [?](wk) and C(i,f(i))k precedesequal Cprime, then the one with more distribution is the selected sense of wk.</Paragraph>
      <Paragraph position="10"> Property 3.3 Let s = w1w2 ***wn be any possible segmented sequence of S, corresponding a set of probabilities of POS sequences As = {P(P(i)1 P(i)2 ***P(i)n )|i [?] I}. Each P(i)1 P(i)2 ***P(i)n corresponds a set of probabilities of concept sequences B(i)s = {P(C(i,j)1 C(i,j)2 ***C(i,j)n )|j [?] J}, where C(i,j)k has the POS of P(i)k , then</Paragraph>
      <Paragraph position="12"> is the choice of segmentation, where a &gt; 0,b &gt; 0 and a+b = 1. More precisely, (5) is rewritten by</Paragraph>
      <Paragraph position="14"/>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 WSD driven Closed Seman-
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
tic Constraint
</SectionTitle>
      <Paragraph position="0"> From the corpus and the statistical WSD, we can make an induction of the arguments along the hypernymy tree, which leads to the closed semantic constraints automatically. At the same time, the closed semantic constraints also provide a possible approach to the empirical optimization of GN and GV . While the total optimization of a WordNet-like lexicon is still an open problem.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Similarity between Concepts
</SectionTitle>
      <Paragraph position="0"> tree, in which the label map is one-to-one. Always, we presume that the hypernymy tree is not degenerative.</Paragraph>
      <Paragraph position="1"> In a hypernymy tree of a WordNet-like lexicon, a node is a code and a label is a SynSet. Since the label map is injective, without generality, a SynSet is usually denoted by a node. We assume that the precedence relation between the brother nodes always implies an ordering of time, usage, frequency, mood, etc. For instance, {spring,springtime} [?] {summer,summertime} [?] {fall,autumn} [?] {winter,wintertime} as the hyponyms of {season,time of year}.</Paragraph>
      <Paragraph position="2"> Definition 4.3 Let f,b and B denote father, the nearest younger-brother and the nearest elderbrother respectively, satisfying that f = fb,f = fB and Bb = bB = 1.</Paragraph>
      <Paragraph position="3"> Definition 4.4 [?]x,y [?] N, let z [?] N be their nearest ancestor satisfying z = fm(x) and z = fn(y),D(x,y) def= m + n. k [?] N is called the offset of x from its eldest brother if [?]Bk(x) and notexistentialBk+1(x). Let the offset of y is l, the similarity between x and y is:  ordered set.</Paragraph>
      <Paragraph position="4"> The elementary structural transformations in a WordNet-like lexicon include:</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Induction of Constraints
</SectionTitle>
      <Paragraph position="0"> GN (or GV ) denotes the set of noun (or verb) concepts. Let C [?] GV be a verb concept with one argument. Suppose that we have gotten the initial closed semantic constraint of its argument, Cprime [?] GN, from a concept-tagged sentence. A link from Cprime to C is added between GN and GV .</Paragraph>
      <Paragraph position="1"> If Cprimeprime from another sentence is also a close semantic constraint of C's argument, then the infimum of Cprime and Cprimeprime,inf(Cprime,Cprimeprime), is the new Cprime. [?]x [?] G,Cprime precedesequal x, if the substitution from C to x still induces well-formed sentences, then the induction succeeds. Otherwise, the disjointed union Cprime[?]Cprimeprime is the closed semantic constraint.</Paragraph>
      <Paragraph position="2"> Definition 4.6 The induction of the closed semantic constraints of C,D [?] G is defined by</Paragraph>
      <Paragraph position="4"> succeeds in the substitution C [?]D otherwise Definition 4.7 By Theorem 4.1, the induction between C [?]D and E [?] G is defined by</Paragraph>
      <Paragraph position="6"> Theoretically, if C1[?]C2[?]***[?]Cn is the closed semantic constraint of the argument of C [?] GV , then [?]i,[?]x[Ci precedesequal x] succeeds in the substitution. Thus, in the WordNet-like lexicon, there are n links from GN to GV for C, where n is called the length of the constraint. The approach to the closed semantic constraints of the verb concepts with two arguments is similar.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Clustering of Constraints
</SectionTitle>
      <Paragraph position="0"> Definition 4.8 Suppose that there are N arguments for all verb concepts and the length of the i-th constraint is li, then -l =</Paragraph>
      <Paragraph position="2"> average length of the constraints.</Paragraph>
      <Paragraph position="3"> -l indicates the rationality of the concept classification in a WordNet-like lexicon, which also acts as an index of the evolution. Our presupposition is that the optimization of the lexicon must have the least average length of the constraints. The clustering of noun concepts constrained by the arguments of the verb concepts should be a standard of the classification of GN.</Paragraph>
      <Paragraph position="4">  Definition 4.9 S squiggleright S1 squiggleright *** squiggleright Sn squiggleright *** is Cauchy sequence of evolution iff [?]epsilon1 &gt; 0,[?]N [?] N,[?]i,j &gt; N,r(SisquigglerightSj) &lt; epsilon1.</Paragraph>
      <Paragraph position="5"> Theorem 4.2 The Cauchy sequence of evolution is convergent. And [?]epsilon1 &gt; 0,[?]i,j [?] N s.t. |-l(Si) [?]  -l(Sj) |&lt; epsilon1.</Paragraph>
      <Paragraph position="6"> GN is structured by not only the hypernymy relation but also the closed semantic constraints. Of course, the hypernymy relation in GN is principal, but not necessarily unique. As described in Example 2.1, the distinct angles of view provide enough space for the evolution. By the hypernymy relation in GV , we have Property 4.1 [?]C,Cprime [?] GV ,C precedesequal Cprime, if the closed semantic constraint of Cprime is C1 [?] C2 [?] *** [?] Cn, then [?]Cn+1,*** ,Cm [?] GN such that (((C1 [?]C2 [?] ***[?]Cn)intersectionsqCn+1)intersectionsq***intersectionsqCm) is the closed semantic constraint of C.</Paragraph>
      <Paragraph position="7"> This property provides an approach to the empirical testing of the concept classification of GV if GN is fixed. Separately, GN (or GV ) can be evaluated by some indexes and evolves to a satisfiable result. A little more complicated, the closed semantic constraints destroy the independent evolution of GN and GV . If GV is fixed, then the optimization of GN may be implemented (but not completely reliable) and vice versa. While it is still an open problem to define a numerical measure that could formalize the optimization of the total structures in a WordNet-like lexicon, especially GN and GV .</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML