File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0854_metho.xml

Size: 7,943 bytes

Last Modified: 2025-10-06 14:09:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0854">
  <Title>KUNLP System in SENSEVAL-3</Title>
  <Section position="3" start_page="0" end_page="2" type="metho">
    <SectionTitle>
2 KUNLP system
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
2.1 Word Sense Disambiguation
</SectionTitle>
      <Paragraph position="0"> We disambiguate senses of a word in a context  by selecting a substituent word from the WordNet  relatives of the target word. Figure 1 represents a flowchart of the proposed approach. Given a target word and its context, a set of relatives of the target word is created by searches in WordNet. Next, the most appropriate relative that can be substituted for the word in the context is chosen. In this step, co-occurrence frequency is used. Finally, the sense of the target word that is related to the selected relative is determined.</Paragraph>
      <Paragraph position="1"> The example in Figure 2 illustrates how the proposed approach disambiguates senses of the target word chair given the context. The set of relatives CUpresident, professorship, ...CV of chair is built by WordNet searches, and the probability,  In this paper, a context indicates a target word and six words surrounding the target word in an instance.  &amp;quot;C8D6B4D4D6D3CUCTD7D7D3D6D7CWCXD4CYBVD3D2D8CTDCD8B5,&amp;quot; that a relative can be substituted for the target word in the given context is estimated by the co-occurrence frequency between the relative and each of the context words. In this example, the relative, seat, is selected with the highest probability and the proper sense, &amp;quot;a seat for one person, with a support for the back,&amp;quot; is chosen. Thus, the second step of our system (i.e. selecting a relative) has to be carefully implemented to select the proper relative that can substitute for the target word in the context, while the first step (i.e. acquiring the set of relatives) and the third step (i.e. determining a sense) are done simply through searches in WordNet.</Paragraph>
      <Paragraph position="2"> The substituent word of the CX-th target word D8DB</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="2" end_page="2" type="metho">
    <SectionTitle>
CX
</SectionTitle>
    <Paragraph position="0"> in a context BV is defined to be the relative of D8DB</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
CX
</SectionTitle>
    <Paragraph position="0"> which has the largest co-occurrence probability with the words in the context:  dure for chair Then Equation 2 may be calculated under the assumption that words in BV occur independently:</Paragraph>
    <Paragraph position="2"> is the CZ-th word in BV and D2 is the number of words in BV. In Equation 3, we assume independence among words in BV.</Paragraph>
    <Paragraph position="3"> The first probability in Equation 3 is calculated as follows:  In the case that several relatives have the largest co-occurrence probability, all senses related to the relatives are determined as proper senses.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.2 Co-occurrence Frequency Matrix
</SectionTitle>
      <Paragraph position="0"> In order to select a substituent word for a target word in a given context, we must calculate the probabilities of finding relatives, given the context. These probabilities can be estimated based on the co-occurrence frequency between a relative and context words as follows:  co-occur.</Paragraph>
      <Paragraph position="1"> In order to calculate these probabilities, frequencies of words and word pairs are required. For this, we build a co-occurrence frequency matrix that contains co-occurrence frequencies of words pairs. In this matrix, an element D1</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="2" end_page="5" type="metho">
    <SectionTitle>
CXCY
</SectionTitle>
    <Paragraph position="0"> represents the frequency that the i-th word and j-th word in the vocabulary co-occur in a corpus  . The frequency of a word can be calculated by counting all frequencies in the same row or column. The vocabulary is composed of all content words in the corpus. Now, the equations 6 and 7 can be calculated with the matrix. The matrix is easily built by counting each word pair in a given corpus. It is not necessary to make an individual matrix for each polysemous word, since the matrix contains co-occurrence frequencies of all word pairs. Hence, it is possible to disambiguate all words with only one matrix. In other words, the proposed method disambiguates the senses of all words efficiently with only one matrix.</Paragraph>
    <Section position="1" start_page="4" end_page="5" type="sub_section">
      <SectionTitle>
2.3 WordNet Relatives
</SectionTitle>
      <Paragraph position="0"> Our system used most of relationship types in Word-Net, except sister and attribute types, to acquire the relatives of target words. For a nominal word, we included all hypernyms and hyponyms in distance 3 from a sense, which indicate parents, grandparents and great-grand parents for hypernymy and children, grandchildren and great-children for hy- null We implemented WordNet APIs with index files and data files in WordNet package, which is downloadable from http://www.cogsci.princeton.edu/ wn/.</Paragraph>
      <Paragraph position="1"> fine grained coarse grained recall prec. recall prec.</Paragraph>
    </Section>
    <Section position="2" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
Words
</SectionTitle>
      <Paragraph position="0"> of the target word is determined, relationship types related to the POS are considered to acquire the candidate relatives of the target word. For instance, if a target word is adverb, the following relationships of the word are considered: synonymy, antonymy, and derived.</Paragraph>
    </Section>
    <Section position="3" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
2.4 WordNet Multiword Expression
</SectionTitle>
      <Paragraph position="0"> Our system recognizes multiword expressions of WordNet in an instance by a simple string match before disambiguating senses of a target word. If the instance has a multiword expression including the target word, our system does not disambiguate the senses of the multiword expression but just assigns all senses of the multiword expression to the instance.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="5" end_page="6" type="metho">
    <SectionTitle>
3 Official Results
</SectionTitle>
    <Paragraph position="0"> We have participated in both English lexical sample task and English all words task. Table 1 and 2 show the official results of our system for two tasks. Our system disambiguates all instances, thus the coverage of our system is 100% and precision of our system is the same as the recall.</Paragraph>
    <Paragraph position="1"> Our system assigns WordNet sense key to each instance, but verbs in English lexical sample task are annotated based on Wordsmyth definitions. In official submission, we did not map the WordNet sense keys of verbs to Wordsmyth senses, thus the recall of our system for verbs is 0%. Table 1 shows the results after a mapping between Wordsmyth and WordNet verb senses using the file EnglishLS.dictionary.mapping.xml.</Paragraph>
    <Paragraph position="2"> In English all word task, there are two additional scoring measures in addition to fine- and coarse-grained scoring: with U and without U  These measures are described in Benjamin Synder's mail any instance without a WN sensekey is assumed to be tagged with a 'U' and thus is tagged as correct if the answer file (i.e. answer.key) has a 'U', incorrect otherwise. In without U, any instance without a WN sensekey is assumed to have been skipped, thus precision will not be affected, but recall will be lowered.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML