XML Viewer - j98-4002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/j98-4002_abstr.xml
Size: 8,081 bytes
Last Modified: 2025-10-06 13:49:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-4002">
  <Title>Selective Sampling for Example-based Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="574" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Word sense disambiguation is a potentially crucial task in many NLP applications, such as machine translation (Brown, Della Pietra, and Della Pietra 1991), parsing (Lytinen 1986; Nagao 1994) and text retrieval (Krovets and Croft 1992; Voorhees 1993). Various corpus-based approaches to word sense disambiguation have been proposed (Bruce and Wiebe 1994; Charniak 1993; Dagan and Itai 1994; Fujii et al. 1996; Hearst 1991; Karov and Edelman 1996; Kurohashi and Nagao 1994; Li, Szpakowicz, and Matwin 1995; Ng and Lee 1996; Niwa and Nitta 1994; Sch~itze 1992; Uramoto 1994b; Yarowsky 1995). The use of corpus-based approaches has grown with the use of machine-readable text, because unlike conventional rule-based approaches relying on hand-crafted selectional rules (some of which are reviewed, for example, by Hirst \[1987\]), corpus-based approaches release us from the task of generalizing observed phenomena through a set of rules. Our verb sense disambiguation system is based on such an approach, that is, an example-based approach. A preliminary experiment showed that our system performs well when compared with systems based on other approaches, and motivated  * Department of Library and Information Science, University of Library and Information Science, 1-2 Kasuga, Tsukuba, 305-8550, Japan t Department of Artificial Intelligence, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4, Kawazu, Iizuka, Fukuoka 820-0067, Japan ~t Department of Computer Science, Tokyo Institute of Technology, 2-12-10ookayama Meguroku Tokyo 152-8552, Japan (~) 1998 Association for Computational Linguistics  Computational Linguistics Volume 24, Number 4 us to further explore the example-based approach (we elaborate on this experiment in Section 2.3). At the same time, we concede that other approaches for word sense disambiguation are worth further exploration, and while we focus on example-based approach in this paper, we do not wish to draw any premature conclusions regarding tlhe relative merits of different generalized approaches.</Paragraph>
    <Paragraph position="1"> As with most example-based systems (Fujii et al. 1996; Kurohashi and Nagao 1994; Li, Szpakowicz, and Matwin 1995; Uramoto 1994b), our system uses an example database (database, hereafter) that contains example sentences associated with each verb sense. Given an input sentence containing a polysemous verb, the system chooses the most plausible verb sense from predefined candidates. In this process, the system computes a scored similarity between the input and examples in the database, and choses the verb sense associated with the example that maximizes the score. To realize this, we have to manually disambiguate polysemous verbs appearing in examples, prior to their use by the system. We shall call these examples supervised examples.</Paragraph>
    <Paragraph position="2"> A preliminary experiment on eleven polysemous Japanese verbs showed that (a) the more supervised examples we provided to the system, the better it performed, and (b) in order to achieve a reasonable result (say over 80% accuracy), the system needed a hundred-order supervised example set for each verb. Therefore, in order to build an operational system, the following problems have to be taken into account1: given human resource limitations, it is not reasonable to supervise every example in large corpora (&amp;quot;overhead for supervision&amp;quot;), given the fact that example-based systems, including our system, search the database for the examples most similar to the input, the computational cost becomes prohibitive if one works with a very large database size (&amp;quot;overhead for search&amp;quot;).</Paragraph>
    <Paragraph position="3"> These problems suggest a different approach, namely to select a small number of optimally informative examples from given corpora. Hereafter we will call these examples samples.</Paragraph>
    <Paragraph position="4"> Our example sampling method, based on the utility maximization principle, decides on the preference for including a given example in the database. This decision procedure is usually called selective sampling (Cohn, Atlas, and Ladner 1994). The overall control flow of selective sampling systems can be depicted as in Figure 1, where &amp;quot;system&amp;quot; refers to our verb sense disambiguation system, and &amp;quot;examples&amp;quot; refers to an unsupervised example set. The sampling process basically cycles between the word sense disambiguation (WSD) and training phases. During the WSD phase, the system generates an interpretation for each polysemous verb contained in the input example (&amp;quot;WSD outputs&amp;quot; of Figure 1). This phase is equivalent to normal word sense disambiguation execution. During the training phase, the system selects samples for training from the previously produced outputs. During this phase, a human expert supervises samples, that is, provides the correct interpretation for the verbs appearing in the samples. Thereafter, samples are simply incorporated into the database without any computational overhead (as would be associated with globally reestimating pa- null Flow of control of the example sampling system.</Paragraph>
    <Paragraph position="5"> phases, the system progressively enhances the database. Note that the selective sampiing procedure gives us an optimally informative database of a given size irrespective of the stage at which processing is terminated.</Paragraph>
    <Paragraph position="6"> Several researchers have proposed this type of approach for NLP applications.</Paragraph>
    <Paragraph position="7"> Engelson and Dagan (1996) proposed a committee-based sampling method, which is currently applied to HMM training for part-of-speech tagging. This method sets several models (the committee) taken from a given supervised data set, and selects samples based on the degree of disagreement among the committee members as to the output. This method is implemented for statistics-based models. How to formalize and map the concept of selective sampling into example-based approaches has yet to be explored.</Paragraph>
    <Paragraph position="8"> Lewis and Gale (1994) proposed an uncertainty sampling method for statistics-based text classification. In this method, the system always samples outputs with an uncertain level of correctness. In an example-based approach, we should also take into account the training effect a given example has on other unsupervised examples.</Paragraph>
    <Paragraph position="9"> This is introduced as training utility in our method. We devote Section 4 to further comparison of our approach and other related works.</Paragraph>
    <Paragraph position="10"> With respect to the problem of overhead for search, possible solutions would include the generalization of similar examples (Kaji, Kida, and Morimoto 1992; Nomiyama 1993) or the reconstruction of the database using a small portion of useful instances selected from a given supervised example set (Aha, Kibler, and Albert 1991; Smyth and Keane 1995). However, such approaches imply a significant overhead for supervision of each example prior to the system's execution. This shortcoming is precisely what our approach aims to avoid: we aim to reduce the overhead for supervision as well as the overhead for search.</Paragraph>
    <Paragraph position="11"> Section 2 describes the basis of our verb sense disambiguation system and preliminary experiment, in which we compared our method with other disambiguation methods. Section 3 then elaborates on our example sampling method. Section 4 reports on the results of our experiments through comparison with other proposed selective sampling methods, and discusses theoretical differences between those methods.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML