File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0208_metho.xml
Size: 12,994 bytes
Last Modified: 2025-10-06 14:14:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0208"> <Title>Sense Tagging: Semantic Tagging with a Lexicon</Title> <Section position="4" start_page="47" end_page="48" type="metho"> <SectionTitle> 3 Comparing Different Approaches </SectionTitle> <Paragraph position="0"> Approach 2a is the least promising since text tagged with word senses is practically non-existent and is both time consuming and difficnlt to produce manually. Much of the research in this area has been compromised by the fact that researchers have focussed on lexical ambiguities that are not true word sense distinctions, such as words translated differently across two languages (Gale, Church, and Yarowsky, 1992) or homophones ~ (Yarowsky, 1993).</Paragraph> <Paragraph position="1"> Even in the cases where data with the appropriate sense distinctions is available, the text is unliicely to be from the desired domain: a word sense discriminator trained on company news text will be much less effective on text about electronics products. A discriminator trained on many types of text so as to be generic will not be particularly successful in any specific domain.</Paragraph> <Paragraph position="2"> Approach 2b has received much attention recently.</Paragraph> <Paragraph position="3"> Its disadvantage is that sense disambiguation is not carried out relative to any well defined set of senses, but rather an ad hoc set. Although this research has been the most successful of all approa~es, it is difficult to see what use could be made of the word sense distinctions produced.</Paragraph> <Paragraph position="4"> Using approach 1 with hand crafted lexicons has the disadvantage of being expensive to create: in- null essentially hand crafted disambiguators. They reported that the word expert for &quot;throw&quot; is &quot;currently six pages long, but should be ten times that size&quot;, making this approach impractical for any system aiming for broad coverage.</Paragraph> </Section> <Section position="5" start_page="48" end_page="48" type="metho"> <SectionTitle> 4 Proposed Approach </SectionTitle> <Paragraph position="0"> Word senses are not absolute or Platonic but defined by a given lexicon, as has been known for many years from early work on WSD, even though the contrary seems widely believed: &quot;.. it is very difficult to assign word occurrences to sense classes in any manner that is both general and determinate. In the sentences &quot;I have a stake in this country.&quot; and &quot;My stake in the last race was a pound&quot; is &quot;stake&quot; being used in the same sense or not? If &quot;stake&quot; can be interpreted to mean something as vague as 'Stake as any kind of investment in any enterprise' then the answer is yes. So, if a semantic dictionary contained only two senses for &quot;stake&quot;: that vague sense together with 'Stake as a post', then one would expect to assign the vague sense for both the sentences above. But if, on the other hand, the dictionary distinguished 'Stake as an investment' and 'Stake as an initial payment in a game or race' then the answer would be expected to be different. So, then, word sense disambiguation is relative to the dictionary of sense choices available and can have no absolute quality about it.&quot; (Wilks, 1972) There is no general agreement over the number of senses appropriate for lexical entries: at one end of the spectrum Wierzbicka (Wierzbicka, 1989) claims words have essentially one sense while Pustejovsky believes that &quot;... words can assume a potentially ini~nite number of senses in context.&quot;(Pustejovsky, 1995) How, then, are we to get an initial lexicon of word senses? We believe the best resource is still a Machine Readable Dictionary: they have a relatively well-defined set of sense tags for each word and lexical coverage is high.</Paragraph> <Paragraph position="1"> MRDs are, of course, normally generic, and much practical WSD work is for sub-domains. We are adhering to the view that it is better to start with such a generic lexicon and adapt it automatically with specialist words and senses. The work described here is part of ECRAN (Wilks, 1995), a European LRE project on tuning lexicons to domains, with a general sense tagging module used as a first stage.</Paragraph> </Section> <Section position="6" start_page="48" end_page="49" type="metho"> <SectionTitle> 5 Knowledge Sources </SectionTitle> <Paragraph position="0"> An interesting fact about recent word sense disambiguation algorithms is that they have made use of different, orthogonal, sources of information: the in-</Paragraph> <Paragraph position="2"> formation provided by each source seems independent of and has no bearing on any of the others. We propose a tagger that makes use of several types of information (dictionary definitions, parts-of-speech, domain codes, selectional preferences and collocates) in the tradition of McRoy (McRoy, 1992) although, the information sources we use are orthogonal, unlike the sources she used, making it easier to evaluate the performance of the various modules.</Paragraph> <Section position="1" start_page="48" end_page="48" type="sub_section"> <SectionTitle> 5.1 Part-of-speech </SectionTitle> <Paragraph position="0"> It has already been shown that part-of-speech tags are a useful discriminator for semantic disambiguation (Wilks and Stevenson, 1996), although they are not, normally, enough to fully disambiguate a text.</Paragraph> <Paragraph position="1"> For example knowing &quot;bank&quot; in &quot;My bank is on the corner.&quot; is being used as a noun will tell us that the word is not being used in the 'plane turning corner' sense but not whether it is being used in the 'financial institution' or 'edge of river' senses. Part-of-speech tags can provide a valuable step towards the solution to sense tagging: fully disambiguating about 87% of ambiguous word tokens and reducing the ambiguity for some of the rest.</Paragraph> </Section> <Section position="2" start_page="48" end_page="48" type="sub_section"> <SectionTitle> 5.2 Domain codes (Thesaural categories) </SectionTitle> <Paragraph position="0"> Pragmatic domain codes can be used to disambiguate (usually nominal) senses, as was shown by (Bruce and Guthrie, 1992) and (Yarowsky, 1992).</Paragraph> <Paragraph position="1"> Our intuition here is that disambiguation evidence can be gained by choosing senses which are closest in a thesanral hierarchy. Closeness in such a hierarchy can be effectively expressed as the number of nodes between concepts. We are implementing a simple algorithm which prefers close senses in our domain hierarchy which was derived from LDOCE (Bruce and Guthrie, 1992).</Paragraph> </Section> <Section position="3" start_page="48" end_page="48" type="sub_section"> <SectionTitle> 5.3 Collocates </SectionTitle> <Paragraph position="0"> Recent work has been done using collocations as semantic disambiguators, (Yarowsky, 1993), (Dorr, 1996), particularly for verbs. We are attempting to derive disambiguation information by examining the prepositions as given in the subcategorization frames of verbs, and in the example sentences in LDOCE.</Paragraph> </Section> <Section position="4" start_page="48" end_page="49" type="sub_section"> <SectionTitle> 5.4 Selectionai Preferences </SectionTitle> <Paragraph position="0"> There has been a long tradition in NLP of using selectional preferences for WSD (Wilks, 1972). This approach has been recently used by (McRoy, 1992) and (Mahesh and Beale, 1996). At its best it disambiguates both verbs, adjectives and the nouns they modify at the same time, but we shall use this information late in the disambiguation process when we hope to be reasonably confident of the senses of nouns in the text from processes such as 5.2 and 5.5.</Paragraph> </Section> <Section position="5" start_page="49" end_page="49" type="sub_section"> <SectionTitle> 5.5 Dictionary definitions </SectionTitle> <Paragraph position="0"> Lesk (Lesk, 1986) proposed a method for semantic disambiguation using the dictionary definitions of words as a measure of their semantic closeness and proposed the disambiguation of sentences by computing the overlap of definitions for a sentence.</Paragraph> <Paragraph position="1"> Simmulated annealing, a numerical optimisation algorithm, was used to make this process practical (Cowie, Guthrie, and Guthri~, 1992), choosing an assignment of senses from as many as 10 ldeg choices.</Paragraph> <Paragraph position="2"> The optimisation is carried out by minimising an evaluation function, computed from the overlap of a given configuration of senses. The overlap is the total number of times each word appears more than once in the dictionary definitions of all the senses in the configuration. So that if the word &quot;bank&quot; appeared three times in a given configuration we would add two to the overlap total. This function has the disadvantage that longer definitions are prefered over short ones, since these simply have more words which can contribute to the overlap. Thus short definitions or definitions by synonym are penalised. null We attempted to solve this problem by making a slight change to the method for calculating the overlap. Instead of each word contributing one we normalise it's contribution by the number of words in the definition it came from, so if a word came from a definition with three words it would add one third to the overlap total. In this way long definitions have to have many words contributing to the total to be influential and short definitions are not penalised.</Paragraph> <Paragraph position="3"> We found that this new function lead to a small improvement in the results of the disambiguation, however we do not believe this to be statistically significant.</Paragraph> </Section> </Section> <Section position="7" start_page="49" end_page="49" type="metho"> <SectionTitle> 6 A Basic Tagger </SectionTitle> <Paragraph position="0"> We have recently implemented a basic version of this tagger, initially incorporating only the part-of-speech (5.1) and dictionary definition (5.5) stages in the process, with further stages to be added later.</Paragraph> <Paragraph position="1"> Our tagger currently consists of three modules: used extensively in NLP research and provides a broad set of senses for sense tagging.</Paragraph> <Paragraph position="2"> The text is initially stemmed, leaving only morphological roots, and split into sentences. Then words belonging to a list of stop words (prepositions, pronouns etc.) are removed. For each of the remaining words, e~ of its senses are extracted from LDOCE and stored with that word. The textual definitions in each sense is processed to remove stop words and stem remaining words.</Paragraph> <Paragraph position="3"> 2. The text is tagged using the Brill tagger (Brill, 1992) and a translation is carried out using a manually defined mapping from the syntactic tags assigned by Briil (Penn Tree Bank tags (Marcus, Santorini, and Marcinkiewicz, 1993)) onto the simpler part-of-speech categories associated with LDOCE senses. We then remove all senses whose part-of-speech is not consistent with the one assigned by the tagger, if none of the senses are consistent with the part-of-speech we assume the tagger has made an error and do not remove any senses.</Paragraph> <Paragraph position="4"> 3. The final stage is to use the simulated annealing algorithm to optimise the dictionary deftnition overlap for the remaining senses. This algorithm assigns a single sense to each token which is the tag assodated with that token.</Paragraph> </Section> <Section position="8" start_page="49" end_page="50" type="metho"> <SectionTitle> 7 Example Output </SectionTitle> <Paragraph position="0"> Below is an example of the senses assigned by the system for the sentence &quot;A rapid rise in prices soon eventuated unemployment.&quot; We show the homograph and sense numbers from LDOCE with the stemmed content words from the dictionary definitions which are used to calculate the overlap following the dash.</Paragraph> <Paragraph position="1"> * rapid homograph 1 sense 2 - done short time * rise homograph 2 sense 1 - act grow greater powerful * soon homograph 0 sense 1 - long short time * prices homograph 1 sense 1 - amount money which thing be offer sell buy * unemployment homograph 0 sense 1 - condition lack job The senses have additional information associated which we do not show here: domain codes, part of speech and grammatical information as well as semantic information.</Paragraph> <Paragraph position="2"> The senses for a word in LDOCE are grouped into homographs, sets of senses realeated by meaning. For example, one of the homographs of &quot;bank&quot; means roughly 'things piled up', the different senses distinguishing exactly what is piled up.</Paragraph> </Section> class="xml-element"></Paper>