File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0305_intro.xml

Size: 7,358 bytes

Last Modified: 2025-10-06 14:06:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0305">
  <Title>Acquisition of Computational-Semantic Lexicons from Machine Readable Lexicai Resources</Title>
  <Section position="2" start_page="0" end_page="32" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Treatment of lexical ambiguity such as WSD has been found useful in many NLP applications, including information retrieval (McRoy 1992; Krovetz and Croft 1992) and machine translation (Brown et al.</Paragraph>
    <Paragraph position="1"> 1991; Dagan et al. 1991; Dagan and Itai 1994). Recently, various approaches (Dolan 1994; Luk 1995; Yarowsky 1992; Dagan et al. 1991 ;Dagan and Itai 1994) to word sense division have been used in WSD research. Directly using dictionary senses as the sense division has several advantages. First, sense distinction according to a dictionary is readily available from MRDs such as the LDOCE (Longman 1992). Second, indicative words and concepts for each sense are directly available in numbered definitions and examples. Lesk (1986) demonstated that dictionary entries can be used to generate signatures of senses for WSD. However, using MRD as the knowledge source for sense division and disambiguation encounters certain problems. Dolan (1994) observed that sense division in MRD is frequently too free for the purpose of WSD. A WSD system based on dictionary senses faces an unnecessary and difficult &amp;quot;forced-choices.&amp;quot; Most researchers resorted to human intervention to identify and group closely related senses.</Paragraph>
    <Paragraph position="2"> This paper describes a heuristic algorithm capable of automatically assigning a label to each of the senses in a machine readable dictionary (MRD) for the purpose of acquiring a computational-semantic lexicon for treatment of lexical ambiguity. Including these labels in the MRD-based lexical database offers several positive effects. The labels can be used as a coarser sense division so unnecessarily fine sense distinction can be avoided in word sense disambiguation (WSD). The algorithm is based primarily on simple word matching between an MRD definition sentence and word lists of an LLOCE (McArthur 1992) topic.</Paragraph>
    <Paragraph position="3"> We begin by giving the details of material used, including the characteristics of definition sentences in LDOCE and the organization of words in LLOCE. Next, the algorithm for labeling LDOCE senses is described. An illustrative example demonsu~ates the effectiveness of the algorithm. After describing the al- null gorithm, the experimental results for a 12-word test set are presented. Our discussion also entails the possible implication of the labels to such problems as: acquisition of a lexicon capable of providing broad coverage, systematic word sense shifts, lexical underspecification, and acquisition of zero-derivatives at the sense level. Moreover, the proposed algorithm is compared with other approaches in available literature.</Paragraph>
    <Paragraph position="4"> Finally, concluding remarks are made.</Paragraph>
    <Paragraph position="5"> 2. Identifying the topic of senses The labeling of dictionary definition sentences with a coarse sense distinction such as the set labels in LLOCE is a special form of the WSD problem. No simple method can solve the general problem of WSD for unrestricted text. We will show that this labeling task is made simplex for several reasons. For example,  consider the definition sentences for the first 5 senses of &amp;quot;bank&amp;quot; in LDOCE: 1. land along the side of a river, lake, etc.</Paragraph>
    <Paragraph position="6"> 2. earth which is heaped up in a l~eld or garden, often making a border or division.</Paragraph>
    <Paragraph position="7"> 3.a mass of snow, clouds, mud. etc.</Paragraph>
    <Paragraph position="8"> 4.a slope made at Oends in a road or race-track, so that they are safer for cars to go round.</Paragraph>
    <Paragraph position="9"> 5.= SANDBANK (a high underwater bank of sand in a river, harbour, etc.).</Paragraph>
    <Paragraph position="10">  First of aLl, only simple words are used in the definitions. Furthermore, the text generation schemes are rather regular. The scheme that lexicographers used in generating the definitions above is similar to the DEFINITION scheme described in McKeown (1985). A DEFINITION scheme begins with a genus term (that is, conceptual parent or ancestor of the sense), followed by the so-called differentia that consists of words: semanficaUy related to the sense to provide specifics about the sense. Those relations between the sense and its defining words are reflected in semantic dusters that are termed categorical, functional, and situational clusters in McRoy (1992). Moreover, those relations have been shown to be very effective knowledge sources for WSD (McRoy 1992) and interpretation of noun sequences (Vanderwende 1994). For instance, land, earth, mass, slope, and sand are the genus terms that are categorically related to bank. On the other hand, words in the differentia such as river, lake.field, garden, l~end, road. race-track, and harbour are Situationally related to bank through the Location relation. Other keywords such as rOOd, and race-traC/\[~ are related functionally to bank through the PartOfrelation. For the most part, those relations exist conveniently among words under the same topic or across cross-referendng topics in LLOCE. For instance, most of the above mentioned words are listed under the same topic Ld (Geography) of the intended label/Ld099, or its cross reference Me (Places). Therefore, these definitions can be disambiguated very effectively on the base of similarity between the defining keywords and the words lists in LLOCE.</Paragraph>
    <Paragraph position="11"> 2.1. Organizing information in LLOCE In this work, the labels used for tagging dictionary definitions are taken from the LLOCE (McArthur 1992). Words in LLOCE are organized mainly according to subject matter. Nearly 2,500 sets of related words in LLOCE are organized according to 14 subjects and 129 topics (TOP). Cross references (REF) between sets, topics, and subjects are also given to show various inter-sense relations not captured within the same topic. The cross references in LLOCE are primarily between topics.</Paragraph>
    <Paragraph position="12"> The sets under which the word is listed in LLOCE are considered as the initial candidates for labeling. For instance, the Candidates for labeling senses of &amp;quot;bank&amp;quot; are the foUowing 4 set labels:  Ld099 and Nj295 are listed under Ld (Geography) and Nj (Action and Position) respectively. For instance, there is a REF link (in Figure 1) from topic Je to topic De (Belonging and Owning, Getting and Giving). To facilitate estimation of similarity between a definition sentence and a topic, we use TOPS to denote the list of words under a LLOCE topic S, while REFS denotes the list of words under cross references of S. For instance, the label Jel04 (as well as Jel06) is associated with a list of words from its topic (TOPJel04) and cross reference (REFJe l 04 = TOPDe): TOPJe l04 = TOPJe = {affluent, budget, cut down, deficit, economize, fortune, giro, income, keep, luxury, maintenance, needy, pay, windfall, amenity .... }</Paragraph>
    <Paragraph position="14"/>
  </Section>
class="xml-element"></Paper>
Download Original XML