File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-3007_intro.xml

Size: 1,330 bytes

Last Modified: 2025-10-06 14:00:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-3007">
  <Title>Word Sense Disambiguation for Cross-Language Information Retrieval</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The CINDOR cross-language information retrieval system (Diekema et al., 1998) uses an information structure known as &amp;quot;conceptual interlingua&amp;quot; for query and document representation. This conceptual interlingua is a hierarchically organized multilingual concept lexicon, which is structured following WordNet (Miller, 1990). By representing query and document terms by their WordNet synset numbers we arrive at essentially a language neutral representation consisting of synset numbers representing concepts. This representation facilitates cross-language retrieval by matching tea-m synonyms in English as well as across languages. However, many terms are polysemous and belong to multiple synsets, resulting in spurious matches in retrieval. The nounfigure for example appears in 13 synsets in WordNet 1.6. This research paper describes the early stages I of our efforts to develop a word sense disambiguation (WSD) algorithm aimed at improving the precision of our cross-language retrieval system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML