File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0807_metho.xml
Size: 26,099 bytes
Last Modified: 2025-10-06 14:08:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0807"> <Title>Sense Information for Disambiguation: Confluence of Supervised and Unsupervised Methods</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Dictionary Preparation CL Research's DIMAP (Dictionary Maintenance </SectionTitle> <Paragraph position="0"> Programs) disambiguates open text against WordNet or any other dictionary converted to DIMAP. The dictionaries used for disambiguation operate in the background (as distinguished from the foreground development and maintenance of a dictionary), with rapid lookup to access and examine the multiple senses of a word after a sentence has been parsed.</Paragraph> <Paragraph position="1"> DIMAP allows multiple senses for each entry, with fields for definitions, usage notes, hypernyms, hyponyms, other semantic relations, and feature structures containing arbitrary information.</Paragraph> <Paragraph position="2"> For SENSEVAL-2, WordNet was entirely converted to alphabetic format for use as the disambiguation dictionary. Details of this conversion (which captures all WordNet information) and the creation of a separate &quot;phrase&quot; dictionary for all noun and verb multiword units (MWUs) are described in Litkowski (2001). In disambiguation, the phrase dictionary is examined first for a match, with the full phrase then used to identify the sense inventory rather than a single word.</Paragraph> <Paragraph position="3"> NODE was prepared in a similar manner, with several additions. A conversion program transformed the MRD files into various fields in DIMAP, the notable difference being the much richer and more formal structure (e.g., lexical preferences, grammar fields, and subsensing). Conversion also automatically created &quot;kind&quot; and &quot;clue&quot; regular expression phrases under individual headwords, e.g., &quot;(as) happy as a sandboy (or Larry or a clam)&quot; under happy was converted into a collocation pattern for a sense under happy, written &quot;(as|?) ~ as (a sandboy |Larry |a clam)&quot;, with the tilde marking the target word. Further details on this conversion and definition parsing to enrich the sense information are also provided in Litkowski (2001). After parsing was completed, a phrase dictionary was also created for The SENSEVAL lexical sample tasks (disambiguating one of 73 target words within a text of several sentences) were run independently against the WordNet and NODE sense inventories, with the WordNet results submitted. To investigate the viability of mapping for WSD, subdictionaries were created for each of the lexical sample words. For each word, the subdictionaries consisted of the main word and all entries identifiable from the phrase dictionary for that word. (For bar, in NODE, there were 13 entries where bar was the first word in an MWU and 50 entries where it was the head noun; for begin, there was only one entry.) The NODE dictionaries were then mapped into the WordNet dictionaries (see Litkowski, 1999), using overlap among words and semantic relations.</Paragraph> <Paragraph position="4"> The 73 dictionaries for the lexical sample words gave rise to 1372 WordNet entries and 1722 NODE entries. Only 491 entries (of which, 418 were MWUs) were common (i.e., no mappings were available for the remaining 1231 NODE entries, all of which were MWUs); 881 entries in WordNet were therefore inaccessible through NODE. For the entries in common, there was an average of 5.6 senses, of which only 64% were mappable into WordNet, thus creating our initial impression that use of NODE would not be feasible.3</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Disambiguation Techniques </SectionTitle> <Paragraph position="0"> Details of the disambiguation process are provided in Litkowski (2001). In general, for the lexical sample, the sentence containing the target word was first parsed and the part of speech of the target word was used to select the sense inventory. If the tagged word was part of an MWU, the MWU's sense inventory was used. The dictionary entry for the word was then accessed. Before evaluating the senses, the topic area of the context provided by the sentence was &quot;established&quot;. Subject labels for all senses of all content words in the context were tallied.</Paragraph> <Paragraph position="1"> Each sense of the target was then evaluated, based on the available information for the sense, including type restrictions such as transitivity, presence of accompanying grammatical constituents such as infinitives or complements, selectional2WordNet definitions were not parsed. An experiment showed the semantic relations identifiable through parsing were frequently inconsistent with those in WordNet.</Paragraph> <Paragraph position="2"> 3Note that a mapping from WordNet to NODE generates similar mismatch statistics.</Paragraph> <Paragraph position="3"> preferences for verbs and adjectives, form restrictions such as number and tense, grammatical roles, collocation patterns, contextual clue words, contextual overlap with definitions and examples, and topical area matches. Points were given to each sense and the sense with the highest score was selected; in case of a tie, the first sense was selected.</Paragraph> <Paragraph position="4"> The top line of Table 1 shows our official results using WordNet as the disambiguation dictionary, with an overall precision (and recall) of 0.293 at the fine-grained level and 0.367 at the coarse-grained level. Disambiguating with NODE immediately after the official submission and mapping its senses into WordNet senses achieved comparable levels of precision, with a coverage of 75% based on the senses that could be mapped into WordNet, even though the NODE coverage was 100%.</Paragraph> <Paragraph position="5"> Since our original submission, we have implemented many additional routines and improved our NODE mapping to WordNet; our revised precision shown in Table 1 are now 0.368 at the fine-grained level and 0.462 at the coarse-grained level using WordNet and 0.337 and 0.427 using NODE.</Paragraph> <Paragraph position="6"> Of particular note are the facts that the mapping from NODE to WordNet is now 89% and that precision is comparable except for the verbs.</Paragraph> <Paragraph position="7"> In Litkowski (2002), we examined the mapping from NODE to WordNet in considerable detail.</Paragraph> <Paragraph position="8"> Several of our findings are pertinent to our analysis of the features affecting disambiguation. Table 1 reflects changes to the automatic mapping along with hand changes. The automatic mapping changes account for the change in coverage. The hand mapping shows that the automatic mapping was about 70% accurate. Interestingly, the hand changes did not affect precision. In general, the fact that we were able to achieve a level of precision comparable to WordNet suggests the most frequent senses of the lexical sample words were able to be disambiguated and mapped correctly into WordNet.</Paragraph> <Paragraph position="9"> The significant discrepancy between the entries (all MWUs, 1231 entries in NODE not in WordNet and 871 entries in WordNet not in NODE) in part reflects the usual editorial decisions that would be found in examining any two dictionaries. However, since WordNet is not lexicographically based, many of the differences are indicative of the idiosyncratic development of WordNet. WordNet may identify several types of an entity (e.g., apricot bar and nougat bar), where NODE may use one sense (&quot;an amount of food or another substance formed into a regular narrow block&quot;) without creating separate entries that follow this regular lexical rule.</Paragraph> <Paragraph position="10"> For the most part, verb phrases containing particles are equally present in both dictionaries (e.g., draw out and draw up), but NODE contains several more nuanced phrases (e.g., draw in one's horns, draw someone aside, keep one's figure, and pull oneself together). NODE also contains many idioms where a noun is used in a verb phrase (e.g., call it a day, keep one's mouth shut, and go back to nature).</Paragraph> <Paragraph position="11"> About 100 of our disambiguations using NODE were to MWUs not present in WordNet (20% of our coverage gap).</Paragraph> <Paragraph position="12"> Of most significance to the sense mapping is the classical problem of splitting (attaching more importance to differences than to similarities, resulting in more senses) and lumping (attaching more significance to similarities than to differences, resulting in fewer senses). Splitting accounts for the remaining 80% gap in our coverage (where NODE identified senses not present in WordNet). The effect of lumping is more difficult to assess. When a NODE definition corresponds to more than one sense in WordNet, we may disambiguate correctly in NODE, but receive no score since we have mapped into the wrong definition; the WordNet sense groupings may allow us to receive credit at the coarse grain, but not at the fine grain. We have examined this issue in more detail in Litkowski (2002), with the conclusion that lumping reduces our NODE score since we are unable to pick out the single WordNet sense answer.</Paragraph> <Paragraph position="13"> More problematic for our mapping was the absence of crucial information in WordNet. Delfs (2001) described a sense for begin that has an infinitive complement, but present only in an example sentence and not explicitly encoded with the usual WordNet verb frame. Similarly, for train, two sentences were &quot;tagged to transitive senses despite being intransitive because again we were dealing with an implied direct object, and the semantics of the sense that was chosen fit; we just pretended that the object was there.&quot; In improving our disambiguation routines, it will be much more difficult to glean the appropriate criteria for sense selection in WordNet without this explicit information than to obtain it in NODE and map it into WordNet. Much of this information is either not available in WordNet, available only in an unstructured way, only implicitly present, or inconsistently present.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Feature Analysis Methodology </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Identifying Disambiguation Features </SectionTitle> <Paragraph position="0"> As indicated above, our disambiguation routines assign point values based on a judgment of how important each feature seems to be. The weighting scheme is ad-hoc. For the feature analysis, we simply recorded a binary variable for each feature that had made a contribution to the final sense selection. In particular, we identified the following features: (1) whether the sense selected was the default (first) sense (i.e., no other features were identified in examining any of the senses), (2) whether the identified sense was based on the occurrence of the target word in an idiom, (3) whether a type (specifically, transitivity) factored into the sense selection, (4) whether the selected sense had any syntactic or semantic clues, (5) whether a subcategorization pattern figured into the sense selection, (6) whether the sense had a specified word form (e.g., capitalization, tense, or number), (7) whether a syntactic usage was relevant (e.g., nouns as modifiers or an adjective being used as a noun, such as &quot;the blind&quot;), (8) whether a selectional preference was satisfied (for verb subjects and objects and adjective modificands), (9) whether we were able to use a Lesk-style context clue from the definitions or an example, and (10) topic area (e.g., subject fields, usage labels, or register labels associated with definitions).</Paragraph> <Paragraph position="1"> As the disambiguation algorithm proceeded, we recorded each of the features associated with each sense. After a sense was selected, the features associated with that sense were written to a file (as a hexadecimal number) for subsequent analysis. We sorted the senses for each target word in the lexical sample and summarized the features that were used for all instances that had the same sense. We then summarized the features over all senses and further summarized them by part of speech. These results are shown in Table 2.</Paragraph> <Paragraph position="2"> The first column shows the number of instances for each part of speech and overall. The second column shows the number of instances where the disambiguation algorithm selected the default sense.</Paragraph> <Paragraph position="3"> These cases indicate the absence of positive information for selecting a sense and may be construed as indicating that the sense inventory may not make sufficient sense distinctions. The default numbers are somewhat misleading for verbs, where the mere presence of an object (recorded in the &quot;with&quot; column) sufficed to make a selection &quot;non-default&quot;. As well, the default selections may indicate that our disambiguation does not yet make full use of the distinctions that are available. As we make improvements in our algorithm, we would expect the number of default selections to decrease.</Paragraph> <Paragraph position="4"> The significant difference in the number of default selections between WordNet and NODE is a broad indicator that there is more information available in NODE than in WordNet. In examining the results for individual words, even in cases where the &quot;default&quot; (or first) sense was being selected, the decision was being made in NODE based on positive information rather than the absence of information.</Paragraph> <Paragraph position="5"> Generally (but not absolutely), the intent of the compilers of both WordNet and NODE is that the first sense correspond to the most frequent sense. The relative importance of the default sense indicated by our results suggests the importance of ensuring that this is the case. In a few instances, the first NODE sense did not correspond to the first WordNet sense, and we were able to obtain a much better result disambiguating in NODE than in WordNet by using an appropriate mapping from NODE to a second or third WordNet sense. The significance of the default sense is important in the selection of instances in an evaluation such as SENSEVAL; if the instances do not reflect common usage, WSD results may be biased simply because of the instance selection.</Paragraph> <Paragraph position="6"> The &quot;idiom&quot; column indicates those cases where a phrasal entry was used to provide the sense inventory. As pointed out above, these correspond to the MWUs that were created and account for over 10% of the lexical instances.</Paragraph> <Paragraph position="7"> The &quot;kind&quot; and &quot;clue&quot; columns correspond to either strong or slightly weaker collocational patterns that have been associated with individual senses.</Paragraph> <Paragraph position="8"> These correspond to similarly named sense attributes used in the Hector database for SENSEVAL-1, which was the experimental basis for NODE. As can be seen in the table, these were relevant to the sense selection for about 6.5 percent of the instances for NODE. We converted several of WordNet's verb frames into clue format; however, they did not show up as features in our analysis, probably because our implementation needs to be improved. We expect that further improvements will obtain some cases where these are relevant in the WordNet disambiguation (as well as increasing the number of cases where these are relevant to NODE senses).</Paragraph> <Paragraph position="9"> The context column reflects the significance of Lesk-style information available in the definitions and examples. In general, it appears that about a third of the lexical instances were able to use this information. This reflects the extent to which the dictionary compilers are able to provide good examples for the individual senses. Since space is limited for such examples, our results indicate that there will an inevitable upper limit of the extent to which disambiguation can rely on such information (a conclusion also reached by (Haynes 2001)).</Paragraph> <Paragraph position="10"> The potential significance of subject or topic fields associated with individual senses is indicated by the number of cases where NODE was able to use this information (nearly 20 percent of the instances). NODE makes extensive use of subject labels, particularly in the MRD. We included many subject labels, usage labels, and register labels in our WordNet conversion, but these did not surface in our disambiguation with WordNet. They were very rare for the lexical items used in SENSEVAL. The value shown here is similar to the results obtained by Magnini, et al. (2001), but their low recall suggests that for more common words, there will be a lower opportunity for their use.</Paragraph> <Paragraph position="11"> The word form of a lexical item also emerged as being of some significance when disambiguating with NODE, slightly over 16 percent. In NODE, this is captured by such labels as &quot;often capitalized&quot; or &quot;often in plural form&quot;. No comparable information is available in WordNet.</Paragraph> <Paragraph position="12"> Subcategorization patterns (indicated under the &quot;with&quot; column) were very important in both WordNet (based on the verb frames) and NODE, relevant in 55% and 32% of the sense selections, respectively. As indicated, the &quot;with&quot; category is also important for nouns. For the most part, this indicates that a given noun sense is usually accompanied by a noun modifier (e.g., &quot;metal fatigue&quot;).</Paragraph> <Paragraph position="13"> The &quot;as&quot; column corresponds to nouns used as modifiers, verbs used as adjectives, and adjectives used as nouns. These were fairly important for nouns (7.7%) and verbs (10.3%).</Paragraph> <Paragraph position="14"> The final column, &quot;prefs&quot;, corresponds to selectional preferences for verb subjects and objects and adjective modificands. In these cases, a match occurred when the head noun in these positions either matched literally or was a synonym or within two synsets in the WordNet hierarchy. Although the results were relatively small, this demonstrates the viability of using such preferences.</Paragraph> <Paragraph position="15"> Finally, anomalous entries in the table (e.g., nouns having subcategorization patterns used in the sense selection) generally correspond to our parser incorrectly assigning a part of speech (i.e., treating the noun as a verb sense).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Variation in Disambiguation Features </SectionTitle> <Paragraph position="0"> Space precludes showing the variation in features by lexical item. The attributes in NODE for individual items varies considerably and the differences were reflected in which features emerged as important.</Paragraph> <Paragraph position="1"> For adjectives, idiomatic usages were significant for free, green, and natural. Topics were important for fine, free, green, local, natural, oblique, and simple, indicating that many senses of these words have specialized meanings. Form was important for blind, arising from the collocation &quot;the blind&quot;. The default sense was most prominent for colorless, graceful (with only one sense in NODE), and solemn. Context was important for blind, cool, fine, free, green, local, natural, oblique, and simple, suggesting that these words participate in common expressions that can be captured well in a few choice examples. Selectional preferences on the modificands were useful in several instances.</Paragraph> <Paragraph position="2"> For nouns, idioms were important for art, bar, channel, church, circuit, and post. Clues (i.e., strong collocations) were important for art, bar, chair, grip, post, and sense. Topics were important for bar, channel, church, circuit, day, detention, mouth, nation, post, spade, stress, and yew (even though yew had only one sense in NODE). Context was important for art, authority, bar, chair, channel, child, church, circuit, day, detention, facility, fatigue, feeling, grip, hearth, lady, material, mouth, nature, post, and restraint. The presence of individual lexical items in several of these groupings shows the richness of variations in characteristics, particularly into specialized usages and collocations.</Paragraph> <Paragraph position="3"> For verbs, idioms were important for call, carry, draw, dress, live, play, pull, turn, wash, and work, a reflection of the many entries where these words were paired with a particle. Form was an important feature for begin (over 50% of the instances), develop, face, find, leave, match, replace, treat, and work.</Paragraph> <Paragraph position="4"> Subcategorization patterns were important for all the verbs. However, many verb senses in both WordNet and NODE do not show wide variation in their subcategorization patterns and are insufficient in themselves to distinguish senses. Strong (&quot;kind&quot;) and weak (&quot;clue&quot;) collocations are relatively less important, except for a few verbs (collaborate, serve, and work). Topics are surprisingly significant for several verbs (call, carry, develop, dress, drive, find, play, pull, serve, strike, and train), indicating the presence of specialized senses. Context does not vary significantly among the set of verbs, but it is a feature in one-third of the sense selections. Finally, selectional preferences on verb subjects and objects emerged as having some value.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Generalizability of Feature Analysis, </SectionTitle> <Paragraph position="0"> Relation to Supervised Learning, and The use of feature analysis has advanced our perception of the disambiguation process. To begin with, by summarizing the features used in the sense selection, the technique identifies overall differences between sense inventories. While our comments have focused on information available in NODE, they reflect only what we have implemented. Many opportunities still exist and the results will help us identify them.</Paragraph> <Paragraph position="1"> In developing our feature analysis techniques, we made lists of features available for the senses of a given word. This gradually gave rise to the notion of a &quot;feature signature&quot; associated with each sense. In examining the set of definitions for each lexical item, an immediate question is how the feature signatures differ from one another. This allows us to focus on the issue of adequate sense distinctions: what is it that distinguishes each sense.</Paragraph> <Paragraph position="2"> The notion of feature signatures also raises the question of their correspondence to supervised learning techniques such as the feature selection of (Mihalcea & Moldovan, 2001) and the decision lists used in WASPS (Tugwell & Kilgarriff 2001). This raises the possibility of precompiling a sense inventory and revising our disambiguation strategy to identify the characteristics of an instance's use and then simply to perform a boolean conjunction to narrow the set of viable senses.</Paragraph> <Paragraph position="3"> The use of feature signatures also allows us to examine our mapping functionality. As indicated above, we are unable to map 10 percent of the senses from NODE to WordNet, and of our mappings, approximately 33 percent have appeared to be inaccurate when examined by hand. When we examine the instances where we selected a sense in NODE, but were unable to map to a WordNet sense, we can use these instances either to identify clear cases where there is no WordNet sense.</Paragraph> <Paragraph position="4"> In connection with the use of supervised learning techniques, participants of other teams have provided us with the raw data with which their systems made their sense selections. The feature arrays from (Mihalcea & Moldovan, forthcoming) identify many features in common with our set. For example, they used the form and part of speech of the target word; this corresponds to our &quot;form&quot;. Their collocations, prepositions after the target word, nouns before and after, and prepositions before and after correspond to our idioms, &quot;clues&quot;, and &quot;with&quot; features. The array of grammatical relations used with WASPS (Tugwell & Kilgarriff, 2001) (such as bare-noun, plural, passive, ing-complement, nounmodifier, PP-comp) correspond to our &quot;form&quot;, &quot;clue&quot;, &quot;with&quot;, and &quot;as&quot; features. The data from these teams also identifies bigrams and other context information. Pedersen (2001) also provided us with the output of several classification methods, identifying unigrams and bigrams found to be significant in sense selection. These data correspond to our &quot;context&quot; feature.</Paragraph> <Paragraph position="5"> We have begun to array all these data by sense, corresponding to our detailed feature analysis. Our initial qualitative assessment is that there are strong correspondences among the different data set. We will examine these quantitatively to assess the significance of the various features. In addition, while several features are already present in WordNet and NODE, we fully expect that these other results will help us to identify features that can be added to the NODE sense inventory.</Paragraph> </Section> class="xml-element"></Paper>