File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1716_metho.xml
Size: 7,113 bytes
Last Modified: 2025-10-06 14:08:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1716"> <Title>The Semantic Knowledge-base of Contemporary Chinese and its Applications in WSD [?]</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> DEFINITION Sense definition SEMANTIC CATEGORY </SectionTitle> <Paragraph position="0"> Semantic categories of per word or idiom. A word can be tagged with two or more semantic categories. For instance, the noun Qing Cai (greengrocery) belong to plant |food categories.</Paragraph> <Paragraph position="1"> VALENCE Valence number of each entry. For example, Ke To sum up, the above attributes fall into five kinds of information below: (1) Basic information of entry, such as vocabulary item, part of speech, sub-category, homograph and pronunciation; (2) Descriptions of word meaning, including sense number, definition, and semantic categories; (3) Semantic valence, thematic roles and combinatorial properties for per words; this is the most important part of SKCC and especially useful for WSD and lexical semantics research; (4) English translation and its POS tagging. If a Chinese word has two or more English counterparts, it will be regarded as different entries respectively, and the collocation information will also be given in relevant fields. This can significantly improve the quality of Chinese-English MT system.</Paragraph> <Paragraph position="2"> (5) Corpus-derived authentic examples of a word in context, showing how it is used, how phrases are formed around it, and so on.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Application in WSD </SectionTitle> <Paragraph position="0"> As a large-scale lexical knowledge base, SKCC combines the features of many of the other resources commonly exploited in NLP work: it includes definitions and English translations for individual senses of words within it, as in a bilingual dictionary; it organizes lexical concepts into a conceptual hierarchy, like a thesaurus; and it includes other links among words according to several semantic relations, including semantic role, collocation information etc. As such it currently provides the broadest set of lexical information in a single resource. The kind of information recorded and made available through SKCC is of a type usable for various NLP applications, including machine translation, automatic abstraction, information retrieval, hypertext navigation, thematic analysis, and text processing.</Paragraph> <Paragraph position="1"> In this section, we shall focus on the automatic disambiguation of Chinese word senses involving SKCC since it is most troublesome, and essential for all the above NLP applications (Ide, 1998).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Determination of the polysemous words </SectionTitle> <Paragraph position="0"> and homographs In general terms, word sense disambiguation (WSD) task necessarily involves two steps: (1) the determination of all the polysemous words and homographs in the text or discourse; and (2) a means to assign each occurrence of a word to the appropriate sense.</Paragraph> <Paragraph position="1"> Step (1) can be easily accomplished by reliance on SKCC. Firstly, each entry denotes one single sense of per word in SKCC. Thus, if a word has two or more senses, it will be regard as different entries, and the SENSE field will be filled with different number (as &quot;Cai &quot;in table 3). Therefore, if either of the SENSE and HOMOMORPHISM fields is filled with value in SKCC, the entry must be a polysemous word or homograph.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 WSD based on semantic categories </SectionTitle> <Paragraph position="0"> The senses of most Chinese polysemous words and homographs belong to different semantic categories, and have different syntagmatic features in context (Wang Hui, 2002) . SKCC gives detailed description of such information in AGENT and/or OBJECT fields as illustrated in table 5</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> TRANSLATION light slack </SectionTitle> <Paragraph position="0"> Table 5 Polysemous adjectives in SKCC Based on the above description, the target word Qing Dan in following POS-tagged text can be accurately disambiguated: [1] [?] /m Bei /q Qing Dan /a De /u Long Jing Cha /n A cup of light Longjing tea.</Paragraph> <Paragraph position="1"> [2] Nong Mang Shi /t Jin Cheng /v De /u Ren /n Bu /d Duo /a ,Sheng Yi /n Bi Jiao /d Qing Dan /a.</Paragraph> <Paragraph position="2"> When the season is busy, few farmers go to town and the business is rather slack.</Paragraph> <Paragraph position="3"> In sentence[1], the word modified by Qing Dan is the nounCha (tea) , which is a kind of drink; while the wordQing Dan in sentence [2] is a predicate of &quot;business&quot;. According to the different values in AGENT field, it is easy to judge that these two Qing Dan belong to two semantic categories, viz. the former is light,and the latter is slack.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 WSD based on collocation information </SectionTitle> <Paragraph position="0"> As for the polysemous words or homographs belonging to the same semantic category, the difference between them usually manifests at the collocation level. According to a study in cognitive science, people often disambiguate word sense using only a few other words in a given context (frequently only one additional word) ( Choueka, 1983). Thus, the relationships between one word and others can be effectively used to resolve ambiguity. For example, Chinese verb Zhao has two senses: one is Xun Zhao (look for) and the other is Tui Huan (give change). Only when the verb co-occurs with the noun Qian (money), it can be interpreted as give change; Otherwise, it means look for (see table 6).</Paragraph> </Section> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> TRANSLATION look for give change </SectionTitle> <Paragraph position="0"> Table 6 Different senses of verb Zhao According to table 6, the verb Zhao in sentence [1] below must be look for, because its object is Ren (person), a kind of entity; while Zhao in sentence [2] has two objects, namely, indirect object Wo (me) and direct object Qian (money). Thus, its meaning is give change.</Paragraph> <Paragraph position="1"> [1]Ta Men /r Jiang /d Chu Qu /v Zhao /v Ren /n.</Paragraph> <Paragraph position="2"> They will go out to look for sb.</Paragraph> <Paragraph position="3"> [2]Shou Huo Yuan /n Huan /d Mei You /d Zhao /v Wo /r Qian /n Ni The seller has not given change to me.</Paragraph> <Paragraph position="4"> By making full use of SKCC and a large scale POS-tagged corpus of Chinese, a multi-levels WSD model is developed and has already been used in a Chinese-English MT application.</Paragraph> </Section> class="xml-element"></Paper>