File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0703_intro.xml
Size: 3,839 bytes
Last Modified: 2025-10-06 14:06:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0703"> <Title>I I I i I i I I I I I I I I I I I ! I Word Sense Disambiguation based on Semantic Density</Title> <Section position="3" start_page="0" end_page="17" type="intro"> <SectionTitle> 2 Our approach </SectionTitle> <Paragraph position="0"> The approach described in this paper is based on the idea of semantic density. This can be measured by the number of common words that are within a semantic distance of two or more words. The closer the semantic relationship between two words the higher the semantic density between them. The way it is defined here. the semantic density works well in the case of uniform MRD. In reality there are gaps in the knowledge representations and the semantic density can provide only an estimation of the actual semantic relatedness between words.</Paragraph> <Paragraph position="1"> We introduce the semantic density because it is</Paragraph> <Paragraph position="3"> relatively easy to measure it on a MRD like Word-Net. This is done by counting the number of concepts two words have in common. A metric is introduced in this sense which when applied to all possible combinations of the senses of two or more words it ranks them.</Paragraph> <Paragraph position="4"> Another idea of this paper is to use the Internet as a raw corpora. Thus we have two sources of information: (1) the Internet for gathering statistics and (2) WordNet for measuring semantic density.</Paragraph> <Paragraph position="5"> As will be shown below, a ranking of words senses results from each of these two sources. The issue now is how to combine these two rankings in order to provide an overall ranking. One possibility is to use them in parallel and the other one is to use them serially. We have tried both and the serial approach provided better results. Thus, for a verb - noun pair, the WSD method consists of two Algorithms, the first one ranks the noun senses, of which we retain only the best two senses; and a second Algorithm takes the output produced by the first Algorithm and ranks the pairs of verb - noun senses. Extensions of this method to other pairs than verb - noun are discussed, and larger windows of more than two words are considered.</Paragraph> <Paragraph position="6"> An essential aspect of the WSD method presented here is that we provide a raking of possible associations between words instead of a binary yes/no decision for each possible sense combination. This allows for a controllable precision as other modules may be able to distinguish later the correct sense association from such a small pool.</Paragraph> <Paragraph position="7"> WordNet is a fine grain MRD and this makes it more difficult to pinpoint the correct sense combination since there are many to choose from and many are semantically close. For applications such as machine translation, fine grain disambiguation works well but for information extraction and some other applications this is an overkill, and some senses may be lumped together.</Paragraph> <Paragraph position="8"> A simple sentence or question can usually be briefly described by an action and an object; for example, the main idea from the sentence He has to investigate all the reports can be described by the action-object pair investigate-report. Even the phrase may&quot; be ambiguous by having a poor context, still the results of a search or interface based on such a sentence can be improved if the possible associations between the senses of the verb and the noun are determined.</Paragraph> <Paragraph position="9"> In WordNet (Miller 1990), the gloss of a verb synset provides a noun-context for that verb, i.e. the possible nouns occurring in the context of that particular verb. The glosses are used here in the same way a corpus is used.</Paragraph> </Section> class="xml-element"></Paper>