File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1037_metho.xml
Size: 10,517 bytes
Last Modified: 2025-10-06 14:14:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1037"> <Title>A Concept-based Adaptive Approach to Word Sense Disambiguation</Title> <Section position="3" start_page="237" end_page="238" type="metho"> <SectionTitle> WSD Result \] </SectionTitle> <Paragraph position="0"> fl .... investigation of bank/MONEY check fraud/CRiME... ~ ~ /// |2 .... looted/CRIME stores and robbed/CRIME banks/MONEY .... I / |3 .... a deer/ANIMAL near the river bank/GEO .. I . / 4. A bank/GEO vole/ANIMAL / ,er&quot;- k Figure I General framework for WSD using MRD.</Paragraph> <Paragraph position="1"> disambiguation, an adaptation step is taken to make the knowledge base more relevant to the task at hand, leading to broader and more precise WSD.</Paragraph> <Paragraph position="2"> Figure 1 lays out the general framework for an adaptive conceptual WSD approach, under which this research is being carried out. The learning process described here begins with a step of knowledge acquisition from MRDs.</Paragraph> <Paragraph position="3"> With the acquired knowledge, the system reads the input text and starts the step of initial disambiguation. Adaptive step follows to combine the initial knowledge base with knowledge gleaned from the partially applied to the text again to finalize the disambiguation result. For instance, Figure 1 shows the initial contextual representation (CR) extracted from the Longrnan Dictionary of Contemporary English (Protor 1978, LDOCE) for the GEO-bank sense contained both lexical and conceptual information: {land, river, lake, ...} u {GEO, MOTION .... }. The initial CR is informative enough to disambiguate a passage containing a deer near the river bank in the input text. The initial disambiguation step produces sense tagging of deer~ANIMAL and bank~GEOGRAPHY, but certain instances of bank are left untagged for lack of relevant WSD knowledge. For instance, the GEO-bank sense in the context of vole is unresolved since there is no information linking ANIMAL context to GEOGRAPHY sense of bank. The adaptation step adds deer and ANIMAL to the contextual representation for GEO-bank. The enriched CR therefore contains information capable of disambiguating the instance of bank in the context of vole to produce final disambiguation result.</Paragraph> </Section> <Section position="4" start_page="238" end_page="239" type="metho"> <SectionTitle> 2 Acquiring Conceptual Knowledge from MRD </SectionTitle> <Paragraph position="0"> In this section we apply a so-called TopSense algorithm (Chen and Chang 1998) to acquire CR for MRD senses. The current implementation of TopSense uses the topical information in Longman Lexicon of Contemporary English (McArthur 1992, LLOCE) to represent WSD knowledge for LDOCE senses. In the following subsections we describe how that is done.</Paragraph> <Section position="1" start_page="238" end_page="238" type="sub_section"> <SectionTitle> 2.1 Contextual Representation from MRDs </SectionTitle> <Paragraph position="0"> Dictionary is a text whose subject matter is a language. The purpose of dictionary is to provide definitions of word senses, and in the process it supply knowledge not just about the language, but the world (Wilks et al. 1990). A good-sized dictionary usually has a large vocabulary and good coverage of word senses useful for WSD. However, short MRD definitions and examples per se lack a level of abstraction to function effectively as a contextual representation of word sense. On the other hand, the thesaurus organizes word senses into a fixed set of coarse semantic categories and thus could potentially be useful as the basis of a conceptual CR of word sense.</Paragraph> <Paragraph position="1"> To get the best of both worlds of dictionary and thesaurus, we propose to link an MRD sense to thesaurus categories to produce conceptual representation of its context. Content words extracted directly from the definition sentence of a word sense can be put to use as the word-level contextual representation of that particular word sense.</Paragraph> <Paragraph position="2"> One way of producing such conceptual CR is to link MRD senses to their relevant thesaurus senses and categories. These links furnish the MRD senses with information necessary for building a conceptual CR. We will describe one such approach under which each MRD sense is linked to a relevant thesaurus sense according to its defining words. The linked thesaurus sense, unlike the isolated MDR sense, falls within a certain semantic category.</Paragraph> <Paragraph position="3"> Consequently, we can establish relations between defining words and semantic category that eventually lead to conceptual CR.</Paragraph> <Paragraph position="4"> With the word lists in a thesaurus category cast as a document representing a certain subject matter or topic, the task of constructing conceptual representation of context for a certain MRD sense bears a striking resemblance to the document retrieval task in information retrieval (IR) research. Relatively well-established IR techniques of weighting terms and ranking documents are applied to build a list of topics that are most relevant to the definition of each MRD sense. This list of ranked topics, for a particular word sense, forms a vectorized conceptual representation of context in the space of all possible topics.</Paragraph> </Section> <Section position="2" start_page="238" end_page="239" type="sub_section"> <SectionTitle> 2.2 Illustrative Example </SectionTitle> <Paragraph position="0"> One example is given in this subsection to illustrate how TopSense works.</Paragraph> <Paragraph position="1"> Example 1. Conceptual representation of an LDOCE sense erane.l.n.1, a machine for lifting and moving heavy objects by means of a very strong rope or wire fastened to a movable arm (JIB).</Paragraph> <Paragraph position="2"> For the most relevant topics to fine-grained sense, we get the following ranked list Hd (EQUIPMENT), Ha (MATERIALS), Ma (MOVING).</Paragraph> <Paragraph position="3"> Furthermore, the definition and examples of a particular sense on the surface level seldom are information sufficient to represent context of the sense. For instance, the words machine, lift, move, heavy, object, strong, rope, wire, fasten, movable, arm, jib in the definition of the sense, crane.l.n.1, are hardly enough contextual information to resolve a crane.l.n.1 instance in the Brown corpus shown below: Unsinkable slowed and stopped, hundreds of brilliant white flares swayed eerily down from the black, the air raid sirens ashore rose in a keening shriek, the anti-aircraft guns coughed and chattered- and above it all motors roared and the bombs came whispering and wailing and crashing down among the ships at anchor at Bad. They had come from airports in the Balkans, these hundred-odd Junkers 88's.</Paragraph> <Paragraph position="4"> They had winged over the Adriatic, they had taken Bari by complete surprise and now they were battering her, attacking with deadly skill. They had ruined the radar warning system with their window, they had made themselves invisible above their flares. And they also had the lights of the city, the port wall lanterns, and a shore crane's spotlight to guide on.</Paragraph> <Paragraph position="5"> However, with a level of abstraction made possible by using a thesaurus, it is not difficult to build a conceptual CR of word sense, which is intuitively more effective for WSD. For instance, based on LLOCE topics, the conceptual CR (EQUIPMENT, MATERIALS, MOVING) derived from the definition of crane.l.n.1, is general enough to characterize many salient words appearing in the context of the crane.l.n.1 instance, including motor (EQUIPMENT), lantern (EQUIPMENT), and flare (EQUIPMENT, MATERIALS).</Paragraph> </Section> </Section> <Section position="5" start_page="239" end_page="240" type="metho"> <SectionTitle> 3 The Adaptive WSD Algorithm </SectionTitle> <Paragraph position="0"> We sum up the above descriptions and outline the procedure for the algorithm in this section.</Paragraph> <Paragraph position="1"> In what follows an adaptive disambiguation algorithm based on class-based approach will be described. Next, we give an illustrative example to show how the proposed algorithm works for unrestricted text.</Paragraph> <Section position="1" start_page="239" end_page="239" type="sub_section"> <SectionTitle> 3.1 The algorithm </SectionTitle> <Paragraph position="0"> The proposed algorithm starts with the step of initial disambiguation using the contextual representation CR(W, S) derived from the MRD for the sense S of the head entry W. A step of adaptation followed to produce a knowledge base from the partially disambiguated text.</Paragraph> <Paragraph position="1"> Finally, the undisambiguated part is disambiguated according to the newly acquired knowledge base. The following algorithm gives a formal and detailed description of adaptive WSD.</Paragraph> </Section> <Section position="2" start_page="239" end_page="240" type="sub_section"> <SectionTitle> 3.2 Illustrative Example </SectionTitle> <Paragraph position="0"> Consider the following passage from the Brown corpus: ... Of cattle in a pasture without throwin' 'em together for the purpose was called a &quot;pasture count&quot;. The counters rode through the pasture countin' each bunch of grazin' cattle, and drifted it back so that it didn't get mixed with the uncounted cattle ahead. This method of countin' was usually done at the request, and in the presence, of a representative of the bank that held the papers against the herd. The notes and mortgages were spoken of as &quot;cattle paper&quot;. A &quot;book count&quot; was the sellin' of cattle by the books, commonly resorted to in the early days, sometimes much to the profit of the seller. This led to the famous sayin' in the Northwest of the &quot;books won't freeze&quot;. This became a common byword durin' the ...</Paragraph> <Paragraph position="1"> In our experiment, we observed that hold and paper are related to both MONEY and ROAD sense in the initial knowledge base.</Paragraph> <Paragraph position="2"> Thus, this instance of bank is left unresolved in the initial disambiguation step. The adaptation step discovers that both hold and paper co-occur with some MONEY-bank instances in the partially disambiguated text. Therefore, the system is able to correctly resolve this bank instance to MONEY sense.</Paragraph> </Section> </Section> class="xml-element"></Paper>