File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1014_metho.xml
Size: 4,853 bytes
Last Modified: 2025-10-06 14:12:03
<?xml version="1.0" standalone="yes"?> <Paper uid="A88-1014"> <Title>BUILDING A LARGE THESAURUS FOR INFORMATION RETRIEVAL</Title> <Section position="4" start_page="105" end_page="106" type="metho"> <SectionTitle> ~ FEMALE </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="105" end_page="106" type="sub_section"> <SectionTitle> Sheep </SectionTitle> <Paragraph position="0"> 5. Testing with SMART and CODER Several studies have been undertaken regarding the use of lexical and semantic relations in information retrieval. Though one investigation involved use of a special system constructed at lIT (Wang et al. 1985), most of the other work has involved the SMART system. The first version of SMART ran on IBM mainframes; a more modem form was developed to run under the UNIX operating system (Fox 1983b). In SMART, queries and documents are represented simply as sets of terms, so a multi-dimensional vector space can be constructed wherein each term is associated with a different dimension in that space. Queries and documents can then be associated with points in that space, and documents can be retrieved if &quot;near&quot; to the query. But since queries are typically short, it can be valuable to expand a query with terms related to the original set (especially due to variations in naming practices like those considered in Fumas et al. 1982).</Paragraph> <Paragraph position="1"> In our first experiment, involving a small collection of 82 documents, we found a mild improvement in system performance when all types of related terms (except antonyms) were involved in query expansion (Fox 1980).</Paragraph> <Paragraph position="2"> Similar benefits resulted when using a different, larger collection (Evens et al. 1985). In two later studies we used SMART but worked with Boolean queries. Query expansion then involved &quot;ORing&quot; in related terms with the original ones. Once again, improvements resulted, especially when the p-norm scheme for interpreting Boolean queries was applied (Fox 1983a, 1984). In all of these studies, lexical-semantic relations were identified manually for all query terms that were expanded.</Paragraph> <Paragraph position="3"> In other recent work with SMART, Fox, Miller and Sridaran used the same Boolean queries, but varied the source of related words. They compared with the base case of original queries the results of the following sources for expansion: all words based on manually derived lexical-semantic relations, all words (except for antonyms) taken from the Merriam Webster Thesaurus, and all words (except those in a &quot;stop&quot; list) from the definition appearing in a dictionary for the correct word sense. All expansion schemes gave better results hart the base case. While the lexical-semantic relation method seemed best overall, the dictionary results were comparable and the thesaurus approach was only slightly worse.</Paragraph> <Paragraph position="4"> We are convinced that much larger improvements are possible ff end-users can be more directly involved in the process, so they can decide which words should be expanded, and can select which related terms to include from the lists produced from our thesaurus. Testing this hypothesis, however, requires a more flexible processing paradigm than we have employed in the past. Furthermore, we believe that inferencing using the information in the semantic network we are building earl allow us to develop an effective automatic or semi-automatic scheme for &quot;intelligent&quot; query expansion. The CODER system should support these approaches.</Paragraph> <Paragraph position="5"> Building upon early efforts to build intelligent retrieval systems (Guida and Tasso 1983, Pollitt 1984) and learning from experiences with similar systems (Croft and Thompson 1987), we have been developing the CODER (COmposite Document Expert/effective/extended Retrieval) system (Fox and France 1987, Fox 1987) for the last three years. Though part of that effort deals with new approaches to automatic text analysis (Fox art d Chen 1987), in the current context the most important aspect of CODER is that it is built as a distributed collection of &quot;expert'&quot; modules (according to the models discussed in Belkin et al. 1987) programmed in Prolog or C, to support flexible testing of various AI approaches to information retrieval.</Paragraph> <Paragraph position="6"> Weaver and France have developed modules for handling lexical and semantic relations and a server module providing access to our version of the contents of CDEL.</Paragraph> <Paragraph position="7"> In the future, a module will be added to interface CODER with the SNePS semantic network so that further experiments can be undertaken.</Paragraph> </Section> </Section> class="xml-element"></Paper>