XML Viewer - p06-1058

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1058_metho.xml
Size: 18,925 bytes
Last Modified: 2025-10-06 14:10:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1058">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation</Title>
  <Section position="5" start_page="457" end_page="460" type="metho">
    <SectionTitle>
3 Equivalent Pseudoword
</SectionTitle>
    <Paragraph position="0"> This section describes how to obtain equivalent pseudowords without a seed corpus.</Paragraph>
    <Paragraph position="1"> Monosemous words are unambiguous priori knowledge. According to our statistics, they account for 86%~89% of the instances in a dictionary and 50% of the items in running corpus, they are potential knowledge source for WSD.</Paragraph>
    <Paragraph position="2"> A monosemous word is usually synonymous to some polysemous words. For example the words &amp;quot;Xin Shou , Yan Shou , Ke Shou Zun Zhao Zun Cong Zun Xun , , , , Zun Shou &amp;quot; has similar meaning as one of the senses of the ambiguous word &amp;quot;Bao Shou &amp;quot;, while &amp;quot; Kang Jian , Qiang Jian , Jian Wang Jian Zhuang Zhuang Jian , , Qiang Zhuang Jing Zhuang Zhuang Shi Dun Shi , , , , , Ying Lang Kang Tai Jian Lang Jian Shuo , , , &amp;quot; are the same for &amp;quot;Jian Kang &amp;quot;. This is quite common in Chinese, which can be used as a knowledge source for WSD.</Paragraph>
    <Section position="1" start_page="457" end_page="458" type="sub_section">
      <SectionTitle>
3.1 Definition of Equivalent Pseudoword
</SectionTitle>
      <Paragraph position="0"> If the ambiguous words in the corpus are replaced with its synonymous monosemous word, then is it convenient to acquire knowledge from raw corpus? For example in table 1, the ambiguous word &amp;quot;Ba Wo &amp;quot; has three senses, whose synonymous monosemous words are listed on the right column. These synonyms contain some information for disambiguation task.</Paragraph>
      <Paragraph position="1"> An artificial ambiguous word can be coined with the monosemous words in table 1. This process is similar to the use of general pseudowords (Gale et al., 1992b; Gaustad, 2001; Nakov and Hearst, 2003), but has some essential differences. This artificial ambiguous word need to simulate the function of the real ambiguous word, and to acquire semantic knowledge as the real ambiguous word does. Thus, we call it an equivalent pseudoword (EP) for its equivalence with the real ambiguous word. It's apparent that the equivalent pseudoword has provided a new way to unsupervised WSD.</Paragraph>
      <Paragraph position="2">  the Ambiguous Word &amp;quot;Ba Wo &amp;quot; The equivalence of the EP with the real ambiguous word is a kind of semantic synonym or similarity, which demands a maximum similarity between the two words. An ambiguous word has the same number of EPs as of senses. Each EP's sense maps to a sense of ambiguous word.</Paragraph>
      <Paragraph position="3"> The semantic equivalence demands further equivalence at each sense level. Every corre- null sponding sense should have the maximum similarity, which is the strictest limit to the construction of an EP.</Paragraph>
      <Paragraph position="4"> The starting point of unsupervised WSD based on EP is that EP can substitute the original word for knowledge acquisition in model training.</Paragraph>
      <Paragraph position="5"> Every instance of each morpheme of the EP can be viewed as an instance of the ambiguous word, thus the training set can be enlarged easily. EP is a solution to data sparseness for lack of human tagging in WSD.</Paragraph>
    </Section>
    <Section position="2" start_page="458" end_page="458" type="sub_section">
      <SectionTitle>
3.2 Basic Assumption for EP-based WSD
</SectionTitle>
      <Paragraph position="0"> It is based on the following assumptions that EPs can substitute the original ambiguous word for knowledge acquisition in WSD model training.</Paragraph>
      <Paragraph position="1"> Assumption 1: Words of the same meaning play the same role in a language. The sense is an important attribute of a word. This plays as the basic assumption in this paper.</Paragraph>
      <Paragraph position="2"> Assumption 2: Words of the same meaning occur in similar context. This assumption is widely used in semantic analysis and plays as a basis for much related research. For example, some researchers cluster the contexts of ambiguous words for WSD, which shows good performance (Schutze, 1998).</Paragraph>
      <Paragraph position="3"> Because an EP has a higher similarity with the ambiguous word in syntax and semantics, it is a useful knowledge source for WSD.</Paragraph>
    </Section>
    <Section position="3" start_page="458" end_page="458" type="sub_section">
      <SectionTitle>
3.3 Design and Construction of EPs
</SectionTitle>
      <Paragraph position="0"> Because of the special characteristics of EPs, it's more difficult to construct an EP than a general pseudo word. To ensure the maximum similarity between the EP and the original ambiguous word, the following principles should be followed.</Paragraph>
      <Paragraph position="1">  1) Every EP should map to one and only one original ambiguous word.</Paragraph>
      <Paragraph position="2"> 2) The morphemes of an EP should map one by one to those of the original ambiguous word. 3) The sense of the EP should be the same as the corresponding ambiguous word, or has the maximum similarity with the word.</Paragraph>
      <Paragraph position="3"> 4) The morpheme of a pseudoword stands for  a sense, while the sense should consist of one or more morphemes.</Paragraph>
      <Paragraph position="4"> 5) The morpheme should be a monosemous word.</Paragraph>
      <Paragraph position="5"> The fourth principle above is the biggest difference between the EP and a general pseudo word. The sense of an EP is composed of one or several morphemes. This is a remarkable feature of the EP, which originates from its equivalent linguistic function with the original word. To construct the EP, it must be ensured that the sense of the EP maps to that of the original word. Usually, a candidate monosemous word for a morpheme stands for part of the linguistic function of the ambiguous word, thus we need to choose several morphemes to stand for one sense. The relatedness of the senses refers to the similarity of the contexts of the original ambiguous word and its EP. The similarity between the words means that they serve as synonyms for each other. This principle demands that both semantic and pragmatic information should be taken into account in choosing a morpheme word.</Paragraph>
    </Section>
    <Section position="4" start_page="458" end_page="460" type="sub_section">
      <SectionTitle>
3.4 Implementation of the EP-based Solution
</SectionTitle>
      <Paragraph position="0"> An appropriate machine-readable dictionary is needed for construction of the EPs. A Chinese thesaurus is adopted and revised to meet this demand. null Extended Version of TongYiCiCiLin To extend the TongYiCiCiLin (Cilin) to hold more words, several linguistic resources are adopted for manually adding new words. An extended version of the Cilin is achieved, which includes 77,343 items.</Paragraph>
      <Paragraph position="1"> A hierarchy of three levels is organized in the extended Cilin for all items. Each node in the lowest level, called a minor class, contains several words of the same class. The words in one minor class are divided into several groups according to their sense similarity and relatedness, and each group is further divided into several lines, which can be viewed as the fifth level of the thesaurus. The 5-level hierarchy of the extended Cilin is shown in figure 1. The lower the level is, the more specific the sense is. The fifth level often contains a few words or only one word, which is called an atom word group, an atom class or an atom node. The words in the same atom node hold the smallest semantic distance. null From the root node to the leaf node, the sense is described more and more detailed, and the words in the same node are more and more related. Words in the same fifth level node have the same sense and linguistic function, which ensures that they can substitute for each other without leading to any change in the meaning of a sentence.</Paragraph>
      <Paragraph position="2">  The extended version of extended Cilin is freely downloadable from the Internet and has been used by over 20 organizations in the world</Paragraph>
      <Paragraph position="4"> According to the position of the ambiguous word, a proper word is selected as the morpheme of the EP. Almost every ambiguous word has its corresponding EP constructed in this way.</Paragraph>
      <Paragraph position="5"> The first step is to decide the position of the ambiguous word starting from the leaf node of the tree structure. Words in the same leaf node are identical or similar in the linguistic function and word sense. Other words in the leaf node of the ambiguous word are called brother words of it. If there is a monosemous brother word, it can be taken as a candidate morpheme for the EP. If there does not exist such a brother word, trace to the fourth level. If there is still no monosemous brother word in the fourth level, trace to the third level. Because every node in the third level contains many words, candidate morpheme for the ambiguous can usually be found.</Paragraph>
      <Paragraph position="6"> In most cases, candidate morphemes can be found at the fifth level. It is not often necessary to search to the fourth level, less to the third. According to our statistics, the extended Cilin contains about monosemous words for 93% of the ambiguous words in the fifth level, and 97% in the fourth level. There are only 112 ambiguous words left, which account for the other 3% and mainly are functional words. Some of the 3% words are rarely used, which cannot be found in even a large corpus. And words that lead to semantic misunderstanding are usually content words. In WSD research for English, only nouns, verbs, adjectives and adverbs are considered.</Paragraph>
      <Paragraph position="7">  It is located at http://www.ir-lab.org/.</Paragraph>
      <Paragraph position="8"> From this aspect, the extended version of Cilin meets our demand for the construction of EPs.</Paragraph>
      <Paragraph position="9"> If many monosemous brother words are found in the fourth or third level, there are many candidate morphemes to choose from. A further selection is made based on calculation of sense similarity. More similar brother words are chosen. Computing of EPs Generally, several morpheme words are needed for better construction of an EP. We assume that every morpheme word stands for a specific sense and does not influence each other. It is more complex to construct an EP than a common pseudo word, and the formulation and statistical information are also different.</Paragraph>
      <Paragraph position="10"> An EP is described as follows:</Paragraph>
      <Paragraph position="12"/>
      <Paragraph position="14"> is a sense of the ambiguous word, and W ik is a morpheme word of the EP.</Paragraph>
      <Paragraph position="15"> The statistical information of the EP is calculated as follows: 1)stands for the frequency of the S)(</Paragraph>
      <Paragraph position="17"> and the contextual word W</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="460" end_page="462" type="metho">
    <SectionTitle>
4 EP-based Unsupervised WSD Method
</SectionTitle>
    <Paragraph position="0"> EP is a solution to the semantic knowledge acquisition problem, and it does not limit the choice of statistical learning methods. All of the mathematical modeling methods can be applied to EP-based WSD methods. This section focuses on the application of the EP concept to WSD, and chooses Bayesian method for the classifier construction.</Paragraph>
    <Section position="1" start_page="460" end_page="460" type="sub_section">
      <SectionTitle>
4.1 A Sense Classifier Based on the Bayes-
ian Model
</SectionTitle>
      <Paragraph position="0"> Because the model acquires knowledge from the EPs but not from the original ambiguous word, the method introduced here does not need human tagging of training corpus.</Paragraph>
      <Paragraph position="1"> In the training stage for WSD, statistics of EPs and context words are obtained and stored in a database. Senseval-3 data set plus unsupervised learning method are adopted to investigate into the value of EP in WSD. To ensure the comparability of experiment results, a Bayesian classifier is used in the experiments.</Paragraph>
    </Section>
    <Section position="2" start_page="460" end_page="460" type="sub_section">
      <SectionTitle>
Bayesian Classifier
</SectionTitle>
      <Paragraph position="0"> Although the Bayesian classifier is simple, it is quite efficient, and it shows good performance on WSD.</Paragraph>
      <Paragraph position="1"> The Bayesian classifier used in this paper is  is the set of the context words.</Paragraph>
      <Paragraph position="2"> To simplify the experiment process, the Naive Bayesian modeling is adopted for the sense classifier. Feature selection and ensemble classification are not applied, which is both to simplify the calculation and to prove the effect of EPs in WSD.</Paragraph>
    </Section>
    <Section position="3" start_page="460" end_page="461" type="sub_section">
      <SectionTitle>
Experiment Setup and Results
</SectionTitle>
      <Paragraph position="0"> The Senseval-3 Chinese ambiguous words are taken as the testing set, which includes 20 words, each with 2-8 senses. The data for the ambiguous words are divided into a training set and a testing set by a ratio of 2:1. There are 15-20 training instances for each sense of the words, and occurs by the same frequency in the training and test set.</Paragraph>
      <Paragraph position="1"> Supervised WSD is first implemented using the Bayesian model on the Senseval-3 data set.</Paragraph>
      <Paragraph position="2"> With a context window of (-10, +10), the open test results are shown in table 2.</Paragraph>
      <Paragraph position="3"> The F-measure in table 2 is defined in (2).</Paragraph>
      <Paragraph position="4">  Where P and R refer to the precision and recall of the sense tagging respectively, which are calculated as shown in (3) and (4)  Where C(tagged) is the number of tagged instances of senses, C(correct) is the number of correct tags, and C(all) is the number of tags in the gold standard set. Every sense of the ambiguous word has a P value, a R value and a F value. The F value in table 2 is a weighted average of all the senses.</Paragraph>
      <Paragraph position="5"> In the EP-based unsupervised WSD experiment, a 100M corpus (People's Daily for year 1998) is used for the EP training instances. The Senseval-3 data is used for the test. In our experiments, a context window of (-10, +10) is taken. The detailed results are shown in table 3.</Paragraph>
    </Section>
    <Section position="4" start_page="461" end_page="461" type="sub_section">
      <SectionTitle>
4.2 Experiment Analysis and Discussion
Experiment Evaluation Method
</SectionTitle>
      <Paragraph position="0"> Two evaluation criteria are used in the experiments, which are the F-measure and precision.</Paragraph>
      <Paragraph position="1"> Precision is a usual criterion in WSD performance analysis. Only in recent years, the precision, recall, and F-measure are all taken to evaluate the WSD performance.</Paragraph>
      <Paragraph position="2"> In this paper, we will only show the f-measure score because it is a combined score of precision and recall.</Paragraph>
    </Section>
    <Section position="5" start_page="461" end_page="462" type="sub_section">
      <SectionTitle>
Result Analysis on Bayesian Supervised WSD
Experiment
</SectionTitle>
      <Paragraph position="0"> The experiment results in table 2 reveals that the results of supervised WSD and those of (Qin and Wang, 2005) are different. Although they are all based on the Bayesian model, Qin and Wang (2005) used an ensemble classifier. However, the difference of the average value is not remarkable.</Paragraph>
      <Paragraph position="1"> As introduced above, in the supervised WSD experiment, the various senses of the instances are evenly distributed. The lower bound as Gale et al. (1992c) suggested should be very low and it is more difficult to disambiguate if there are more senses. The experiment verifies this reasoning, because the highest F-measure is less than 90%, and the lowest is less than 60%, averaging about 70%.</Paragraph>
      <Paragraph position="2"> With the same number of senses and the same scale of training data, there is a big difference between the WSD results. This shows that other factors exist which influence the performance other than the number of senses and training data size. For example, the discriminability among the senses is an important factor. The WSD task becomes more difficult if the senses of the ambiguous word are more similar to each other.</Paragraph>
      <Paragraph position="3"> Experiment Analysis of the EP-based WSD The EP-based unsupervised method takes the same open test set as the supervised method. The unsupervised method shows a better performance, with the highest F-measure score at 100%, lowest at 59% and average at 80%. The results shows that EP is useful in unsupervised WSD.</Paragraph>
      <Paragraph position="4">  From the results in table 2 and table 3, it can be seen that 16 among the 20 ambiguous words show better WSD performance in unsupervised SWD than in supervised WSD, while only 2 of them shows similar results and 2 performs worse . The average F-measure of the unsupervised method is higher by more than 10%. The reason lies in the following aspects: 1) Because there are several morpheme words for every sense of the word in construction of the EP, rich semantic information can be acquired in the training step and is an advantage for sense disambiguation.</Paragraph>
      <Paragraph position="5"> 2) Senseval-3 has provided a small-scale training set, with 15-20 training instances for each sense, which is not enough for the WSD modeling. The lack of training information leads to a low performance of the supervised methods.</Paragraph>
      <Paragraph position="6"> 3) With a large-scale training corpus, the unsupervised WSD method has got plenty of training instances for a high performance in disambiguation. null 4) The discriminability of some ambiguous word may be low, but the corresponding EPs could be easier to disambiguate. For example, the ambiguous word &amp;quot;Chuan &amp;quot; has two senses which are difficult to distinguish from each other, but its Eps' senses of &amp;quot;Yue Guo / Chuan Guo / Chuan Yue &amp;quot; and &amp;quot;Chuo / Tong / Tong / Zha &amp;quot;can be easily disambiguated. It is the same for the word &amp;quot;Chong Ji &amp;quot;, whose Eps' senses are &amp;quot; Zhuang Ji /Ke Peng / Peng Zhuang &amp;quot; and &amp;quot;Sun Hai /Shang Hai &amp;quot;. EP-based knowledge acquisition of these ambiguous words for WSD has helped a lot to achieve high performance. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML