File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1192_concl.xml
Size: 2,178 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1192"> <Title>Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets</Title> <Section position="4" start_page="2" end_page="2" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> Considering the fine granularity of the PWN2.0 sense inventory, our disambiguation results using parallel resources are superior to the state of the art in monolingual WSD (with the same sense inventory). This is not surprising since the parallel texts contain implicit knowledge about the sense of an ambiguous word, which has been provided by human translators. The drawback of our approach is that it relies on the existence of parallel data, which in the vast majority of cases is not available. On the other hand, supervised monolingual WSD relies on the existence of large samples of training data, and our method can be applied to produce such data to bootstrap monolingual applications. Given that parallel resources are becoming increasingly available, in particular on the World Wide Web (see for instance http://www.balkantimes.com where the same news is published in 10 languages), and aligned wordnets are being produced for more and more languages, it should be possible to apply our and similar methods to large amounts of parallel data in the not-too-distant future.</Paragraph> <Paragraph position="1"> One of the greatest advantages of our approach is that it can be used to automatically sense-tag corpora in several languages at once.</Paragraph> <Paragraph position="2"> That is, if we have a parallel corpus in multiple languages (such as the Orwell corpus), disambiguation performed on any one of them propagates to the rest via the ILI linkage. Also, given that the vast majority of words in any given language are monosemous (e.g., approximately 82% of the words in PWN have only one sense), the use of parallel corpora in multiple languages for WSD offers the potential to significantly improve results and provide substantial sense-annotated corpora for training in a range of languages.</Paragraph> </Section> class="xml-element"></Paper>