File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0502_intro.xml
Size: 1,965 bytes
Last Modified: 2025-10-06 14:06:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0502"> <Title>A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation Hwee Tou Ng</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1998a, Kflgarrlff, 1998b) 2 A Case Study </SectionTitle> <Paragraph position="0"> In this-paper, we examine the ~ssue of rater-annotator agreement by comparing the agreement rate of human annotators on a large sense-tagged corpus of more than 30,000 instances of the most frequently occurring nouns and verbs of Enghsh This corpus is the intersection of the WORDNET Semcor corpus (Miller et al, 1993) and the DSO corpus (Ng and Lee, 1996, Ng, 1997), which has been independently tagged wlth the refined senses of WORDNET by two separate groups of human annotators The Semcor corpus us a subset of the Brown corpus tagged with ~VoRDNET senses, and consists of more than 670,000 words from 352 text files Sense taggmg was done on the content words (nouns, ~erbs, adjectives and adverbs) m this subset The DSO corpus consists of sentences drawn from the Brown corpus and the Wall Street Journal For each word w from a hst of 191 frequently occurring words of Enghsh (121 nouns and 70 verbs), sentences containing w (m singular or plural form, and m its various reflectional verb form) are selected and each word occurrence w ~s tagged w~th a sense from WoRDNET There ~s a total of about 192,800 sentences in the DSO corpus m which one word occurrence has been sense-tagged m each sentence The intersection of the Semcor corpus and the DSO corpus thus consists of Brown corpus sentences m which a word occurrence w is sense-tagged m each sentence, where w Is one of.the 191 frequently oc,currmg English nouns or verbs Since this common pomon has been sense-tagged by two independent groups of human annotators, ~t serves as our data set for investigating inter-annotator agreement in this paper</Paragraph> </Section> class="xml-element"></Paper>