File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/w00-0801_relat.xml
Size: 3,135 bytes
Last Modified: 2025-10-06 14:15:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0801"> <Title>An Unsupervised Method for Multifingual Word Sense Tagging Using Parallel Corpora: A Preliminary Investigation</Title> <Section position="6" start_page="4" end_page="7" type="relat"> <SectionTitle> 4. Related Work </SectionTitle> <Paragraph position="0"> There are many proposed unsupervised methods in the literature addressing the problem of sense ambiguity in language. All the reported unsupervised methods use monolingual materials, therefore comparable to the results obtained on the target tag set of our preliminary investigation. Moreover, due to differences in the knowledge resources and evaluation material it is hard to establish a direct comparison. For instance, Yarowsky \[1992&1995\] reports the highest accuracy rates, to date, for an unsupervised method of a mean of 92%, yet his evaluation was measured using a knowledge resource, Roget's thesaurus, which has a coarser granularity in its sense representation than WordNet.</Paragraph> <Paragraph position="1"> The most comparable results to our preliminary results are those reported by Resnik \[1997\] since he used the same corpus and evaluated against the same test set. He did not restrict his evaluation to nouns only. Resnik proposed an unsupervised method for sense disambiguation using selectional preference information, thereby using grammatical relations between words in a corpus in order to arrive at the correct sense for a word. He reports accuracy rates in the range of 40.1% on average for five grammatical relations. Yet, Resnik explores a different dimension of meaning that uses a linguistically motivated context window which we expect will be very useful if combined with our approach for examining the verb data, for example.</Paragraph> <Paragraph position="2"> The most related work reported in the literature is that of Ide \[in press\]. Ide explores the question of whether using cross-linguistic information for sense distinction is worth pursuing. She reported a preliminary analysis of translation equivalents in four different languages of George OrweU's Nineteen-Eighty-four. The translations were human translations, i.e. natural parallel corpora. In her study, only 4 words were considered.</Paragraph> <Paragraph position="3"> Native speakers of the four respective languages aligned the chosen English words to their foreign translations manually. The goal of her research was to explore the degree to which words are lexiealized differently in translated text. Ide classifies translation types based on how much they vary in what they align with in translation, for example, if a word aligns with a single word or a phrase or nothing, etc. She reports that in Nineteen-Eighty-Four, only 86.6% of the English words have a single lexical item used in the translation. This suggests that with using alignment methods that target single word to single word alignments the upper bound that the approach can yield is 86.6% for this specific corpus. It will be interesting to conduct a similar study here of the Brown corpus.</Paragraph> </Section> class="xml-element"></Paper>