File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0701_intro.xml

Size: 2,290 bytes

Last Modified: 2025-10-06 14:06:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0701">
  <Title>I I I I I I 1 General Word Sense Disambiguation Method Based on a Full Sentential Context</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 The Task Specification
</SectionTitle>
    <Paragraph position="0"> For our work, we used the word sense definitions as given in WordNet (Miller, 1990), which is comparable to a good printed dictionary in its coverage and distinction of senses. Since WordNet only provides definitions for content words (nouns, verbs, adjectives and adverbs), we are only concerned with identifying the correct senses of the content words.</Paragraph>
    <Paragraph position="1"> Both for the training and for the testing of our algorithm, we used the syntactically analysed sentences of the Brown Corpus (Marcus, 1993), which have been manually semantically tagged (Miller et al., 1993) into semantic concordance files (SemCor).</Paragraph>
    <Paragraph position="2"> These files combine 103 passages of the Brown Corpus with the WordNet lexical database in such a way that every content word in the text carries both a syntactic tag and a semantic tag pointing to the appropriate sense of that word in WordNet. Passages in the Brown Corpus are approximately 2,000 words long, and each contains approximately 1,000 content words.</Paragraph>
    <Paragraph position="3"> The percentages of the nouns, verbs, adjectives and adverbs in the semantically tagged corpus, together with their average number of Word Net senses, are given in Table I. Although most of the words in a dictionary are monosemous, it is the polysemous words that occur most frequently in speech and text. For example, over 80% of words in Word-Net are monosemous, but almost 78% of the content words in the tested corpus had more than one sense, as shown in Table 2.</Paragraph>
    <Paragraph position="4">  Assigning the most frequent sense (as defined by WordNet) to every content word in the used corpus would result in an accuracy of 75.2 %. Our aim is to create a word sense disambiguation system for identifying the correct senses of all content words in a gwen sentence, with an accuracy higher than would be achieved solely by a use of the most frequent sense.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML