File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0704_intro.xml
Size: 4,356 bytes
Last Modified: 2025-10-06 14:06:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0704"> <Title>The Use of WordNet in Information Retrieval</Title> <Section position="3" start_page="0" end_page="31" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Development of WordNet began in 1985 at Princeton University (Miller, 1990). A team lead by Prof. George Miller aimed to create a source of lexical knowledge whose organization would reflect some of the recent findings of psycholinguistic research into the human lexicon. WordNet has been used in numerous natural language processing, such as part of speech tagging (Segond et al., 97), word sense disambiguation (Resnik, 1995), text categorization (Gomez-Hidalgo and Rodriguez, 1997), information extraction (Chai and Biermann, 1997), and so on with considerable success. However the usefulness of WordNet in information retrieval applications has been debatable.</Paragraph> <Paragraph position="1"> Information retrieval is concerned with locating documents relevant to a user's information needs from a collection of documents.</Paragraph> <Paragraph position="2"> The user describes his/her information needs with a query which consists of a number of words. The information retrieval system compares the query with documents in the collection and returns the documents that are likely to satisfy the user's information requirements.</Paragraph> <Paragraph position="3"> A fundamental weakness of current information retrieval methods is that the vocabulary that searchers use is often not the same as the one by which the information has been indexed. Query expansion is one method to solve this problem.</Paragraph> <Paragraph position="4"> The query is expanded using terms which have similar meaning or bear some relation to those in the query, increasing the chances of matching words in relevant documents. Expanded terms are generally taken from a thesaurus.</Paragraph> <Paragraph position="5"> Obviously, given a query, the information retrieval system must present all useful articles to the user. This objective is measured by recall, i.e. the proportion of relevant articles retrieved by the system. Conversely, the information retrieval system must not present any useless article to the user. This criteria is measured by precision, i.e. the proportion of retrieved articles that are relevant.</Paragraph> <Paragraph position="6"> Voorhees used WordNet as a tool for query expansion (Voorhees, 1994). She conducted experiments using the TREC collection (Voorhees and Harman, 1997) in which all terms in the queries were expanded using a combination of synonyms, hypernyms, and hyponyms. She set the weights of the words contained in the original query to 1, and used a combination of 0.1, 0.3, 0.5, 1, and 2 for the expansion terms. She then used the SMART Information Retrieval System Engine (Salton, 1971) to retrieve the documents. Through this method, Voorhees only succeeded in improving the performance on short queries and a tittle with no significant improvement for long queries. She further tried to use WordNet as a tool for word sense disambiguation (Voorhees, 1993) and applied it to text retrieval, but the performance of retrieval was degraded.</Paragraph> <Paragraph position="7"> Stairmand (Stairmand, 1997) used WordNet to compute lexical cohesion according to the method suggested by Morris (Morris and Hirst, 199 I), and applied this to information retrieval.</Paragraph> <Paragraph position="9"> He concluded that his method could not be applied to a fully-functional information retrieval system.</Paragraph> <Paragraph position="10"> Smeaton (Smeaton and Berrut, 1995) tried to expand the queries of the TREC-4 collection with various strategies of weighting expansion terms, along with manual and automatic word sense disambiguation techniques. Unfortunately all strategies degraded the retrieval performance.</Paragraph> <Paragraph position="11"> Instead of matching terms in queries and documents, Richardson (Richardson and Smeaton, 1995) used WordNet to compute the semantic distance between concepts or words and then used this term distance to compute the similarity between a query and a document. Although he proposed two methods to compute semantic distances, neither of them increased the retrieval performance.</Paragraph> </Section> class="xml-element"></Paper>