File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0101_metho.xml

Size: 11,990 bytes

Last Modified: 2025-10-06 14:13:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0101">
  <Title>Word Sense Disambiguation by Human Subjects: Computational and Psycholinguistic Applications</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Design of the Questionnaire
</SectionTitle>
    <Paragraph position="0"> The prototype version of the present questionnaire was a printed list of 100 test texts, each with an ambiguous word highlighted and a list of definitions following. Subjects typically took the test home, and reported needing anything from half an hour to several days to complete the questionnaire.</Paragraph>
    <Paragraph position="1"> The test was difficult to complete for several reasons. The test texts were themselves dictionary definitions, chosen at random from the machine-readable version of the Collins English Dictionary (CED). (This was because the project grew out of an effort specifically to disambiguate definition texts in the CED.) Many of the words being defined by the test texts were highly obscure, e.g.</Paragraph>
    <Paragraph position="2"> paduasoy n. a rich strong silk fabric used for hangings, vestments, etc.</Paragraph>
    <Paragraph position="3"> Or India paper n. another name (not in technical usage) for bible paper \[Ahlswede and Lorand, 1993\] Disambiguation was done (as it still is in the present questionnaire) by choosing one or more from a set of dictionary definitions of the highlighted word. This was hard work, and volunteers were hard to find. Therefore, though the present version of the questionnaire avoids &amp;quot;hard&amp;quot; words except where these are explicitly being studied, it is still tough enough that we pay our subjects a small honorarium.</Paragraph>
    <Paragraph position="4"> Like its prototype, the present questionnaire consists of 100 test texts, each with an ambiguous test word or short phrase (e.g., ring up, go over). The number 100 was chosen, based on our experience with the prototype, as a compromise between a smaller test, easier on the subject but less informative, and a larger test which might be prohibitively difficult or time-consuming to take.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="4" type="metho">
    <SectionTitle>
5 The Test Texts
</SectionTitle>
    <Paragraph position="0"> Source. The test texts have been selected in part to represent a wide variety of written English, while using a minimum of different sources in order to facilitate comparison within each category as well as between categories. The distribution was:  One of the original criteria for both test words and test texts, neutrality between British and American usage \[Ahlswede, 1992\], was found virtually impossible to maintain. The CED is British, and many if not most of its multi-sense entries include definitions of idiomatic British usages. To leave these out would be to risk distorting the results as a metric for a disambiguation program that used the CED as a whole without excluding those particular definitions. The other categories are American, and in the interest of consistency, American idioms were freely permitted as well.</Paragraph>
    <Paragraph position="1"> Several other criteria for selecting test texts were retained and followed: 1. Difficulty of resolution. This can only be estimated subjectively until the questionnaire results are in, except for the twenty dictionary definitions, where a rough measure of difficulty of resolution is provided by the &amp;quot;coefficient of certainty&amp;quot; \[Ahlswede, forthcoming\]. null A second measure, the &amp;quot;coefficient of dissent&amp;quot;, specifically measures disagreement as opposed to uncertainty. The high negative correlation between coefficient of certainty and coefficient of dissent (-0.942) indicated that, in practice, there was little difference between widespread uncertainty and widespread disagreement.</Paragraph>
    <Paragraph position="2"> Partly because of the apparent lack of importance in this distinction, and partly for the convenience of automating the questionnaire, the &amp;quot;0&amp;quot; option in the prototype has been eliminated. The subject is forced to decide &amp;quot;yes&amp;quot; or &amp;quot;no&amp;quot; to each sense. Size of context. The test texts are complete sentences, or (in the case of CED) complete definitions. In some cases phrases have been deleted with ellipsis, where the full text seemed unmanageably long and the deleted phrase irrelevant to the disambiguation of the test word. The net sentence length ranges from 5 to 28 with a median of 14. Results so far indicate, as did Lesk's observations, that sentence length does not significantly affect performance.</Paragraph>
    <Paragraph position="3"> Global context was early recognized as a potential problem: human disambiguation decisions are made not only on the basis of the immediate sentence-level context, but also on an awareness of the domain: for instance, the word capital is likely (though not certain) to mean one thing in the Wall Street Journal and another thing in a political editorial about the federal government.</Paragraph>
    <Paragraph position="4"> Since the test texts are short and have no global context whatever, we compensate by adding a small parenthetical note at the end of each text, identifying it as &amp;quot;WSJ&amp;quot;, &amp;quot;Tips&amp;quot;, &amp;quot;CED&amp;quot;, &amp;quot;Twain&amp;quot; or &amp;quot;special&amp;quot;. The meaning of these short tags is explained to the subject, and though not the same as actual global context, they provide explicitly the information the reader normally deduces during reading.</Paragraph>
  </Section>
  <Section position="6" start_page="4" end_page="4" type="metho">
    <SectionTitle>
6 The Test Words
</SectionTitle>
    <Paragraph position="0"> An factor which is probably important, but impossible to measure, is the familiarity of a test word. Two contrasting intuitions about familiarity are (1) an unfamiliar word should be harder to disambiguate because its senses are less well known to the informant; but (2) a familiar word should be harder because it is likely to have more senses and homographs. Since familiarity is not only completely subjective, but also varies widely from one individual to another, we turn to a much more measurable criterion: Frequency. An unanswered question is whether it is more appropriate to measure word frequency based on the specialized corpora from which the texts are extracted, or based on a single average word frequency list. The texts taken from the CED, the Wall Street Journal, and the &amp;quot;Tips&amp;quot;, having been extracted from multi-million-word corpora, can be measured separately. Unfortunately, we have no online corpus of Mark Twain's works, and the &amp;quot;special&amp;quot; texts are, by definition, not from any corpus at all.</Paragraph>
    <Paragraph position="1"> Part of speech. Studies of disambiguation have focused almost exclusively on nouns, verbs and adjectives, and hardly at all on &amp;quot;function words&amp;quot; such as prepositions, conjunctions, and those adverbs not derived from adjectives. (An exception is Brugman and Lakoff \[1988\], who study the word over.) We are interested in in both kinds of words.</Paragraph>
    <Paragraph position="2"> Therefore the test words include 28 nouns, 22 verbs, 19 adjectives, 16 adverbs (none in -ly), and 15 assorted prepositions, conjunctions and pronouns.</Paragraph>
    <Paragraph position="3"> Given the combination of a British dictionary with such ultra-American sources as Mark Twain, we were unable to guarantee variety neutrality in our test words as in our test texts. An alternative, however, was to include among the &amp;quot;special&amp;quot; texts two with strong variety bias: I took the tube to the repair shop, ambiguous in British but not in American, and It was a long and unpleasant fall,, ambiguous in American hut not (or less so) in British. These were added in the hope that native or learned speakers of British English would handle them differently than speakers of American English.</Paragraph>
  </Section>
  <Section position="7" start_page="4" end_page="4" type="metho">
    <SectionTitle>
7 The User Interface
</SectionTitle>
    <Paragraph position="0"> An important feature of the questionnaire is its user interface. This was developed by one of us (Lorand) in Macintosh HyperCard.</Paragraph>
    <Paragraph position="1"> The interface consists of four principal modules (&amp;quot;stacks&amp;quot; in hypertext terminology): (1) a top-level stack that drives the interface as a whole; (2) a &amp;quot;demographics&amp;quot; stack that manages a menu of demographic and identifying information that the subject fills out; (3) the &amp;quot;import questionnaire&amp;quot; stack, which allows the questionnaire to exist independently of the interface as an editable text file, and to be reinserted into the interface as desired, e.g., after changes have been made; (4) the questionnaire itself, translated automatically into MetaTalk, the MetaCard programming language.</Paragraph>
  </Section>
  <Section position="8" start_page="4" end_page="4" type="metho">
    <SectionTitle>
8 The demographics stack
</SectionTitle>
    <Paragraph position="0"> The menu of the demographics stack first solicits non-identifying portions of the subject's Social Security number and birthday, which are hashed to form a unique, confidential ID for that subject. The menu then solicits potentially relevant demographic information: age, gender, native/non-native speaker of English, number of years speaking English if non-native, and highest educational degree. This last is an extremely rough measure of literacy, but no better one is available, and the preliminary experiment showed that doctoral-level subjects agreed more closely with each other than the non-doctoral subjects did either with the doctorates or with each other \[Ahlswede, forthcoming\].</Paragraph>
    <Paragraph position="1"> The ID and the demographic information are written to a text file in numerically coded form. The subject may then begin the questionnaire or cancel.</Paragraph>
  </Section>
  <Section position="9" start_page="4" end_page="4" type="metho">
    <SectionTitle>
9 The questionnaire stack
</SectionTitle>
    <Paragraph position="0"> The questionnaire is implemented as a series of windows, one for each test text and its associated definitions. The test text is displayed at the top of the window, with the test word in boldface. Below is a subwindow containing the definitions. The subject clicks on a definition to identify it as a good disambiguation; the typeface of the selected definition changes to boldface. Clicking on a selected definition will de-select it and its typeface will change back to regular. Any number of definitions may be selected. If, as sometimes happens, there are too many definitions to fit within the subwindow, it can be scrolled up and down to give access to all the definitions. Arrow buttons at the bottom right and bottom left enable the subject to go ahead to the next text or back to the previous one.</Paragraph>
    <Paragraph position="1"> Every action by the subject is logged, as is its time, in the log file. Thus when the subject is done, we have a complete record of his or her actions, of the time at which each action took place, and thus of the interval between each pair of actions.</Paragraph>
  </Section>
  <Section position="10" start_page="4" end_page="4" type="metho">
    <SectionTitle>
10 The Subjects
</SectionTitle>
    <Paragraph position="0"> So far, most of the subjects recruited have been students, with some faculty and staff.</Paragraph>
    <Paragraph position="1"> We are presently recruiting off campus. Probably thanks to the honorarium, response has been enthusiastic: well over the 100 subjects we considered necessary for an adequate sample. Because we are still occupied with data collection, intensive analysis of the data has not begun yet.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML