File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/w04-2404_relat.xml
Size: 5,345 bytes
Last Modified: 2025-10-06 14:15:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2404"> <Title>Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation</Title> <Section position="8" start_page="2" end_page="2" type="relat"> <SectionTitle> 6 Related Work </SectionTitle> <Paragraph position="0"> (McRoy, 1992) was one of the first to use multiple kinds of features for word sense disambiguation in the semantic interpretation system, TRUMP. The system aims at disambiguating all words in the text and relies extensively on dictionaries and is not corpus based. Scores are assigned based on morphology, part of speech, collocations and syntactic cues. The sense with the highest score is chosen as the intended sense. TRUMP was used to tag a subset of the Wall Street Journal (around 2500 words) but was not evaluated due to lack of gold standard.</Paragraph> <Paragraph position="1"> The LEXAS system of (Ng and Lee, 1996) uses part of speech, morphology, co-occurrences, collocations and verb object relation in nearest neighbor implementation.</Paragraph> <Paragraph position="2"> The system was evaluated using the interest data on which it achieved an accuracy of 87.3%. They studied the utility of individual features and found collocations to be most useful, followed by part of speech and morphological form.</Paragraph> <Paragraph position="3"> (Lin, 1997) takes a supervised approach that is unique as it did not create a classifier for every target word. The system compares the context of the target word with that of training instances which are similar to it. The sense of the target word most similar to these contexts is chosen as the intended sense. Similar to McRoy, the system attempts to disambiguate all words in the text. Lin relies on syntactic relations, such as, subject-verb agreement and verb object relations to capture the context. The system achieved accuracies between 59% and 67% on the Sem-Cor corpus.</Paragraph> <Paragraph position="4"> (Pedersen, 2001b) compares decision trees, decision stumps and a Naive Bayesian classifier to show that bi-grams are very useful in identifying the intended sense of a word. The accuracies of 19 out of the total 36 tasks in SENSEVAL-1 data were greater than the best reported results in that event. Bigrams are easily captured from raw text and the encouraging results mean that they can act as a powerful baseline to build more complex systems by incorporating other sources of information. Pedersen points out that decision trees can effectively depict the relations among the various features used. With the use of multiple sources of information this quality of decision trees gains further significance.</Paragraph> <Paragraph position="5"> (Lee and Ng, 2002) compare the performances of Support Vector Machines, Naive Bayes, AdaBoost and Decision Trees using unigrams, parts of speech, collocations and syntactic relations. The experiments were conducted on SENSEVAL-2 and SENSEVAL-1 data. They found the combination of features achieved highest accuracy (around 73%) in SENSEVAL-1 data, irrespective of the learning algorithm. Collocations(57.2%), part of speech tags(55.3%) and syntactic relations(54.2%) performed better than decision trees using all features in the SENSEVAL-2 data.</Paragraph> <Paragraph position="6"> (Yarowsky and Florian, 2002) performed experiments with different learning algorithms and multiple features.</Paragraph> <Paragraph position="7"> Three kinds of Bayes Classifier, Decision lists and Transformation Based Learning Model (TBL) were used with collocations, bag of words and syntactic relations as features. Experiments on SENSEVAL-2 data revealed that the exclusion of any of the three kinds of features resulted in a significant drop in accuracy. Lee and Ng as well as Yarowsky and Florian conclude that the combination of features is beneficial.</Paragraph> <Paragraph position="8"> (Pedersen, 2002) does a pairwise study of the systems that participated in SENSEVAL-2 English and Spanish disambiguation exercises. The study approaches the systems as black boxes, looking only at the assigned tags whatever the classifier and sources of information may be. He introduces measures to determine the similarity of the classifications and optimum results obtainable by combining the systems. He points out that pairs of systems having low similarity and high optimal accuracies are of interest as they are markedly complementary and the combination of such systems is beneficial.</Paragraph> <Paragraph position="9"> There still remain questions regarding the use of multiple sources of information, in particular which features should be combined and what is the upper bound on the accuracies achievable by such combinations. (Pedersen, 2002) describes how to determine the upper bound when combining two systems. This paper extends that idea to provide measures which determine the upper bound when combining two sets of features in a single disambiguation system. We provide a measure to determine the redundancy in classification done using two different feature sets. We identify particular part of speech and parse features which were found to be very useful and the combinations of lexical and syntactic features which worked best on SENSEVAL-2, SENSEVAL-1, line, hard, serve and interest data.</Paragraph> </Section> class="xml-element"></Paper>