File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0817_intro.xml

Size: 2,357 bytes

Last Modified: 2025-10-06 14:01:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0817">
  <Title>Building a Sense Tagged Corpus with Open Mind Word Expert</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Most of the efforts in the Word Sense Disambiguation (WSD) field have concentrated on supervised learning algorithms. These methods usually achieve the best performance at the cost of low recall. The main weakness of these methods is the lack of widely available semantically tagged corpora and the strong dependence of disambiguation accuracy on the size of the training corpus. The tagging process is usually done by trained lexicographers, and consequently is quite expensive, limiting the size of such corpora to a handful of tagged texts.</Paragraph>
    <Paragraph position="1"> This paper introduces Open Mind Word Expert, a Web-based system that aims at creating large sense tagged corpora with the help of Web users. The system has an active learning component, used for selecting the most difficult examples, which are then presented to the human taggers. We expect that the system will yield more training data of comparable quality and at a significantly lower cost than the traditional method of hiring lexicographers.</Paragraph>
    <Paragraph position="2"> Open Mind Word Expert is a newly born project that follows the Open Mind initiative (Stork, 1999). The basic idea behind Open Mind is to use the information and knowledge that may be collected from the existing millions of Web users, to the end of creating more intelligent software.</Paragraph>
    <Paragraph position="3"> This idea has been used in Open Mind Common Sense, which acquires commonsense knowledge from people. A knowledge base of about 400,000 facts has been built by learning facts from 8,000 Web users, over a one year period (Singh, 2002).</Paragraph>
    <Paragraph position="4"> If Open Mind Word Expert experiences a similar learning rate, we expect to shortly obtain a corpus that exceeds the size of all previously tagged data. During the first fifty days of activity, we collected about 26,000 tagged examples without significant efforts for publicizing the site. We expect this rate to gradually increase as the site becomes more widely known and receives more traffic.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML