File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0502_metho.xml

Size: 5,477 bytes

Last Modified: 2025-10-06 14:15:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0502">
  <Title>A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation Hwee Tou Ng</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Sentence Matching
</SectionTitle>
    <Paragraph position="0"> To determine the extent of inter-annotator agreement, the first step ~s to match each sentence m Semcor to its corresponding counterpart In the DSO corpus This step ~s comphcated by the following factors 1 Although the intersected portion of both corpora came from Brown corpus, they adopted different tokemzatmn convention, and segmentartan into sentences differed sometimes 2 The latest versmn of Semcor makes use of the senses from WORDNET 1 6, whereas the senses used m the DSO corpus were from WoRDNET 15 1 To match the sentences, we first converted the senses m the DSO corpus to those of WORDNET 1 6 We ignored all sentences m the DSO corpus m which a word is tagged with sense 0 or -1 (A word is tagged with sense 0 or -1 ff none of the given senses m WoRDNFT applies ) 4, sentence from Semcor is considered to match one from the DSO corpus ff both sentences are exactl) ldent~cal or ff the~ differ only m the pre~ence or absence of the characters &amp;quot; (permd) or -' (hyphen) null For each remaining Semcor sentence, taking into account word ordering, ff 75% or more of the words m the sentence match those in a DSO corpus sentence, then a potential match ~s recorded These i -kctua\[ly, the WORD~q'ET senses used m the DSO corpus were from a shght variant of the official WORDNE'I 1 5 release Th~s ssas brought to our attention after the pubhc release of the DSO corpus potential matches are then manually verffied to ensure that they are true matches and to ~eed out any false matches Using this method of matching, a total of 13,188 sentence-palrs contasnmg nouns and 17,127 sentence-pa~rs containing verbs are found to match from both corpora, ymldmg 30,315 sentences which form the intersected corpus used m our present study</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="10" type="metho">
    <SectionTitle>
4 The Kappa Statistic
</SectionTitle>
    <Paragraph position="0"> Suppose there are N sentences m our corpus where each sentence contains the word w Assume that w has M senses Let 4 be the number of sentences which are assigned identical sense b~ two human annotators Then a simple measure to quantify the agreement rate between two human annotators Is Pc, where Pc, = A/N The drawback of this simple measure is that it does not take into account chance agreement between two annotators The Kappa statistic a (Cohen, 1960) is a better measure of rater-annotator agreement which takes into account the effect of chance agreement It has been used recently w~thm computatmnal hngu~stlcs to measure rater-annotator agreement (Bruce and Wmbe, 1998, Carletta, 1996, Veroms, 1998) Let Cj be the sum of the number of sentences which have been assigned sense 3 by annotator 1 and the number of sentences whmh have been assigned sense 3 by annotator 2 Then</Paragraph>
    <Paragraph position="2"> and Pe measures the chance agreement between two annotators A Kappa ~alue of 0 indicates that the agreement is purely due to chance agreement, whereas a Kappa ~alue of 1 indicates perfect agreement A Kappa ~alue of 0 8 and above is considered as mdmatmg good agreement (Carletta, 1996) Table 1 summarizes the inter-annotator agreement on the mtersected corpus The first (becond) row denotes agreement on the nouns (xerbs), wh~le the lass row denotes agreement on all words combined The a~erage ~ reported m the table is a s~mpie average of the individual ~ value of each word The agreement rate on the 30,315 sentences as measured by P= is 57% This tallies with the figure reported ~n our earlier paper (Ng and Lee, 1996) where we performed a quick test on a subset of 5,317 sentences ,n the intersection of both the Semcor corpus and the DSO corpus</Paragraph>
    <Paragraph position="4"/>
  </Section>
  <Section position="6" start_page="10" end_page="10" type="metho">
    <SectionTitle>
5 Algorithm
</SectionTitle>
    <Paragraph position="0"> Since the rater-annotator agreement on the intersected corpus is not high, we would like to find out how the agreement rate would be affected if different sense classes were in use In this section, we present a greedy search algorithm that can automatmalb derive coarser sense classes based on the sense tags assigned by two human annotators The resulting derived coarse sense classes achmve a higher agreement rate but we still maintain as many of the original sense classes as possible The algorithm is given m Figure 1 The algorithm operates on a set of sentences where each sentence contains an occurrence of the word w whmh has been sense-tagged by two human annotators - At each Iteration of the algorithm, tt finds the pair of sense classes Ct and Cj such that merging these two sense classes results in the highest t~ value for the resulting merged group of sense classes It then proceeds to merge Cz and C~ Thin process Is repeated until the ~ value reaches a satisfactory value ~,~t,~, which we set as 0 8 Note that this algorithm is also applicable to deriving any coarser set of classes from a refined set for any NLP tasks in which prior human agreement rate may not be high enough Such NLP tasks could be discourse tagging, speech-act categorization, etc</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML